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Most econometricians would support the view that econometrics is learned 
by doing. Unfortunately, many texts provide too few problems, and, more 
important, the solutions are not well developed or well documented. Phillips 
and Wickens attempt to fill this void by providing a series of problems, 
some taken from B.A. and M.A. university examinations, many of which 
have detailed, carefully worked out solutions. To encourage people to 
address the questions without looking at the answers, the former are all 
grouped together followed by the solutions. From a pedagogical view, a 
question-answer format would have been preferable. The authors, however, 
do not view these two volumes as a text but as supplements to standard 
works. Some prior knowledge of econometrics and statistics is a prerequisite 
for volume 1, which covers most of the usual topics in an undergraduate 
course. Volume 2 requires more mathematics and centers on systems of 
equations and dynamic models. This book would serve as a useful reference 
for undergraduates, graduates, and postgraduates. 
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Preface 


This book provides a set of worked and unworked exercises to supplement 
the main textbook material in econometrics. It is written partly for 
students who are commencing undergraduate work in econometrics and 
who have some prior knowledge of statistics, and partly for students who 
are undertaking more advanced undergraduate and graduate instruction. 
We also hope that the book will prove useful to teachers by providing 
material for classroom discussion and to research workers by going a small 
way towards bridging the gap between the textbook and the rapidly 
expanding literature in this field. 

The book attempts to supplement the existing econometric literature in 
two ways. First, in our experience, students very often find that the text- 
book they are assigned does not prepare them adequately to solve the sort 
of problems they face in examinations. They are not shown in a direct 
way how the theory can be used to solve such problems and they are not 
provided with similar questions which they can attempt on their own. 
Second, with the increasing output of econometric research, the coverage 
of the established econometrics textbooks is seen to be less complete. This 
problem is of particular concern to advanced students searching for a 
recent overview of the subject and to beginning research workers who 
often feel the need for a simpler introduction to the recent literature, 
more particularly in the technical areas. 

In this book we provide exercises both on econometric theory and 
applied econometrics that cover most of the material in introductory and 
advanced econometric textbooks. In addition, we have constructed 
problems based on more recent research in order to introduce the reader 
to some new results. As far as possible, we have included more than one 
question on each topic. We have provided a fairly detailed solution to (at 
least) one of these questions; the other questions are intended to be 
supplementary and are left either for the reader to answer on his own or 
for classroom discussion. 

The material within each chapter is organised in the same way. At the 
start of each chapter we have a brief introduction to the subject matter of 
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the chapter and, where this is appropriate, an outline of the notation that 
is to be used. The next two sections contain all of the questions in the 
chapter: first, those questions for which worked solutions are provided; 
and, second, the supplementary questions. The final section in each 
chapter contains the solutions. The questions and solutions are separated 
to encourage the reader to attempt the exercises alonebefore looking at 
the solutions that are provided. The questions are arranged and the 
solutions are written to enable the reader to proceed naturally from one 
question to the next. Knowledge of topics which are treated later in the 
book is not usually required. Where appropriate, the solution includes 
some discussion of the relevant econometric theory; and we have made 
such discussions more detailed in those‘solutions which relate to more 
recent research. Naturally, when the theory has already been explained in 
an earlier solution, it is not repeated, but reference is made to that previous 
solution. Moreover, to assist those readers who will wish to use the book 
in conjunction with a textbook, we have given references, as far as possible, 
to the major textbooks. 

Although the questions are arranged in order of their topic, they are 
not always arranged in order of difficulty. But, when.a topic is first intro- 
duced, it is usually done through simpler questions; and we have sometimes 
used numerical exercises to clarify the manipulations that are involved in 
a particular procedure. Subsequent questions on the same topic then tend 
to be of increasing difficulty. Therefore, both beginning and advanced 
students should find questions of interest throughout the book. Some 
questions deal with various aspects of applied work in econometrics and 
these questions are included towards the end of each chapter. 

The book is divided into two volumes and this has enabled us to cover 
a fairly wide range of topics in the exercises. Volume I covers most of the 
usual textbook material dealing with regression techniques and their appli- 
cation in econometrics. Chapter 1 of this volume is concerned with a 
number of methodological issues that arise in the use of econometric 
techniques including the underlying concept of an econometric model and 
fundamental problems such as aggregation, causality and the distinction 
between recursive and interdependent systems. We also consider the pro- 
blem of extracting and comparing the sampling distribution of least squares 
and other estimators in simple bivariate models allowing for serial corre- 
lation and errors of measurement. A large part of the remainder of the 
volume concentrates on methods of estimation and inference that apply 
to models that are linear in both variables and parameters (Chapters 2 and 
4). There are many exercises illustrating standard results on the linear 
regression model. Most of these are contained in Chapter 2 and a number 
of extensions dealing with autocorrelated errors, multicollinearity, missing 
observations and seasonal adjustment are given in Chapter 4. We also con- 
sider multiple equation models with across equation parameter restrictions 
(Chapter 3) and covariance and error component models (Chapter 4). In 
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Chapter 5 we deal with non-linear regression models and models with 
errors in variables. 

Volume II contains two chapters and deals with models of simultaneous 
equations (Chapter 6) and dynamic models (Chapter 7). Chapter 6 con- 
tains a number of introductory questions on identifiability and the use of 
single equation estimators such as two stage least squares and limited in- 
formation maximum likelihood. A number of numerical questions have 
been included, as in Volume I, to lay out the sequence of manipulations 
needed to compute estimates and confidence intervals. The remaining 
questions in Chapter 6 deal with more advanced topics such as identifi- 
ability in the presence of cross equation parameter restrictions, systems 
methods of estimation, some simpler finite sample theory (Nagar’s moment 
approximations and Edgeworth approximations) and an introduction to 
non-linear simultaneous equations models. Chapter 7 covers problems 
such as the identification of parameters in dynamic models with serially 
correlated errors, the consistent estimation of dynamic models with 
serially correlated errors, distributed lag models, continuous time models 
and models of markets in disequilibrium, as well as a number of applied 
questions. Much of the material in Chapter 7 has not yet appeared in 
textbooks and will, we hope, be of particular interest to advanced 
students and research workers. 

Two major omissions from the book should also be noted: time series 
regression by spectral methods and the use of Bayesian methods in econo- 
metrics. These omissions were made with reluctance through pressure of 
space and time. 

We are greatly indebted to the University of Auckland, the Australian 
National University and the Universities of Birmingham, Bristol, Essex, 
London and York for their permission to use questions which have 
appeared in their examinations. Where we have used questions from these 
examinations, or adapted them for our purpose here, this has been clearly 
indicated. Although it is impossible to give due credit to particular people 
in such cases, we acknowledge with special thanks that the following are 
among the authors of some of these questions: A.R. Bergstrom, R. Bowden, 
A. Chesher, J. Durbin, L.G. Godfrey, D.F. Hendry, G. Mizon, A.R. Pagan, 
J. Richmond, J.D. Sargan and K.F. Wallis. We are also grateful to 
J. Richmond, W. Barnett, V.B. Hall, E. Maasoumi and M. Prior for their 
comments on earlier versions of some of the questions and solutions. They 
are, of course, absolved from blame for any of the errors that remain. 

Finally, it is with great pleasure that we thank Mrs Lucy Lowther, 
Mrs Sheila Ogden and Mrs Phyllis Pattenden for their skill, patience and 
good spirits in preparing the typescript. 


P.C.B. PHILLIPS 
M.R. WICKENS 


July 1978 
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CHAPTER 1 


Concepts, methods and models 


0. INTRODUCTION 


In this chapter we consider the underlying principles of the econometric 
method, introduce the concept of an econometric model and illustrate 
how the theory of probability and the methods of statistical inference can 
be used in economics. The necessary notation will be given in each 
question. 


1. QUESTIONS 


Question 1.1 


Econometrics has been defined as follows: 


(a) ‘The quantitative analysis of actual economic phenomena based on 
the concurrent development of theory and observations related by an 
appropriate method of inference’, Samuelson (1954). 

(b) ‘The main objective of econometrics is to give empirical content to 

a priori reasoning in economics’, Klein (1962). 

(c) ‘The aim of econometrics (is) the empirical determination of 
economic laws. Econometrics rounds off theory by using numerical data 
to verify the existence of postulated relationships and to define their 
precise forms’, Malinvaud (1966). 


In the light of these definitions discuss the nature of econometrics and 
distinguish the main activities with which it is concerned. 


Question 1.2 


The model of a certain market is 
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Yi =A — Op, + One + Uy, (21) 
qt = Bo + Bipt + Bot + ure (a:2-2) 
Ge dr = 1s (1.2.3) 


where g? is quantity demanded, g? is quantity supplied, g; is quantity 
exchanged, p; is price, y; is income and uy, and u are random 
disturbances with zero means. 


(a) Obtain the reduced form of the above model, interpret it, and explain 
its usefulness in forecasting. 


(b) Find the expected change in g and p that would occur in period one as 
a result of introducing an excise tax at a rate of r; = 1, where @ = 10, 


ay = 0.5, On OnGs = 0, By = 0.5, B, = i and Vat 100. 


(c) What tax rate is required in order to achieve in period one an expected 
price of 20? What is the expected value-of g, ? 


Question 1.3 
The model 
Cit = O; + Biyie + vit (ele! erick (an aie rere 2 El gui 


explains the consumption c;; of the 7th household at time t. The income 
of the zth pe at time ¢ is y;, and u;; is a disturbance term with zero 
mean, E (u?,) = 0? and E(ujeujs) = = 0 for all 7 andj and for t #s. Denote 
total consumption and total income at time ¢ by ¢; = LP., cj, and y; = 
Lj=1 Vit, respectively, and let A;,be the proportion of total i income received 
by the 7th household at time ¢. 
(a) Derive the aggregate consumption function at time t and comment on 
its properties if: 

(i) Ajg is non-random; and 

(ii) Aig = A; + Et, where A; is non-random, E(e;;) = 0, E(e?,) = w? and 

E(€;€;;) = 0 for alld andj and allt #s. 

(b) If a, = 0 for all and ¥; is non-random, derive the mean and variance 
of 


Ly ap 
b= Yew | 92 (1.3.2) 
in cases (i) and (ii) above. 


\ 


Question 1.4 


(a) What do you understand by the following terms in the context of an 
economic model: 
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(i) causal relationship, 
(ii) recursive model, 
(iii) interdependent model? 


(b) In the following model Y is real national income, Y° is expected real 
national income, E is real private expenditure, E® is expected real private 
expenditure and G is real government expenditure. The subscript t or 

t — 1 indicates the relevant time period and a bar over a variable indicates 
that the variable is autonomous (or exogenously determined): 


Ve ha ps oD (21) 
EieSecr; +d. (1.4.2) 
E, = Ef (14:3) 
G, = G, (1.4.4) 
Wea TG: (1.4.5) 


Examine each of the relations in the above model for a causal 
interpretation. Is the model recursive? 


Question 1.5 
In the model 
yi = 3x, + uy (¢ = 1, 2) (Eonb) 


the y, are observable random variables, the x; are known to take on the 
fixed values x; = 1, x2 = 2, and the u; are random errors which have the 
following discrete probability distribution for each value of t: 


Ut Probability 
1 z 
(a) If uw; and w are statistically independent, find the sampling 


distributions of the following two estimators of the slope coefficient in 


(125.1): 
2 2 z 2 
Ot = > mp) Xt and Op ye, yexe] Y x? (1.5.2) 
t=1 t=1 t=1 t=1 


Show that var(a, ) > var(a2 ). 


(b) Suppose that uw, and u2 are no longer statistically independent but 
have instead the following joint probability distribution: 
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(u,, U2) Probability 


(le) 10 
(iceman ty) a 
(qld) 1 
(alah) 10 i 


Find the new sampling distributions of the estimators a, and a, and 
verify that, in this case, var(a, ) < var(a; ). 


Question 1.6 
In the model 
Ve = xXe tu; (f= 1,2) (1.6.1) 


the y; are observable random variables; and the u; are serially independent 
random errors which have the following discrete probability distribution 
for each value of t: 


Ut Probability 
1 z 


The x; in (1.6.1) are exogenous and take on the fixed values x; = 0 and 
x2 = 1. But x, is observed with error and we observe instead 


X, =x, +2 (1.6.2) 


where v is a random (measurement) error with the following probability 
distribution 


v Probability 
; 
—eE 5 
where 0 <e€ <#, and v is statistically independent of u,and uy. 
(a) Find the sampling distributions of the following three estimators of 
the slope coefficient (of unity) in (1.6.1): 
Ley Xi yi + X29 y2hee 


a 2:4 = eee 163 
Oe ee ee ean g AneSie 
(b) Compare the relative concentration of the estimators in (1.6.3) about 
the true value of the slope coefficient by computing for each estimator 
the probability that it lies in the following intervals: (i)\[0.75, 1.25], and 


(ii) (0, 2]. 
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Question 1.7 


In the following model of a free market for a certain good 


Cae Lh rt (71) 
q = 3=pta (17.24 
Ga= (1.7.3) 


price, #, is determined in each period so that quantity supplied, g*, equals 
quantity demanded, g*. The random disturbances u and v are statistically 
independent and have the following probability distributions 


u Probability v Probability 


: 4 
0 4 0 H 
=A ‘iaiietia , 


(a) Find the probability distribution of the free market price p in 
(1.7.1—3) and determine its expected value. 


(b) Suppose the government intervenes in the market by buying (or 
selling) an amount g of the good to ensure that price is fixed at its 
expected value (as determined in (a)). What is the probability distribution 
of g? Find the probability that government purchases of the good are at 
least as great as private demand, q°. 


Question 1.8 
A firm uses the following model to forecast the demand for its product: 
Q, = at BY, + ut, CLS. h 
Cy = ¥ + 6Y, + ux, ( 
TOWN GV Ky tae ( 
Ya = iC. +41, (1.8.4 
( 


Rack 2 +5, Lo 
where 

Q,; = number of units of product demanded in year f, 

Y, = real national income in year f¢, 

C, = real consumption in year ¢, 

I, = real investment in year f, 


K, = real stock of capital at the end of year ¢, 


“ uss, U.4, U3¢ = random disturbances in year ¢. 
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The disturbances w,;, U2, and uz, are assumed to be normally distributed 
with zero means and covariance matrix 


The estimated values of the parameters in this model are: « = 100, B = 50, 
y= 10,6 =0.5,A = 0.5, @ = 3, 0,, = 100, 02. = 1, 043 = 0.5, 033 = 1. 
The real national income in 1976 was £30 (thousand million) and the real 
stock of capital at the end of 1976 was £75 (thousand million). 

Assuming that the true values of the parameters are equal to the above 
estimates and treating Yj974 = 30 and K,974 = 75 as known non-random 
numbers, forecast the value of Qj977 and find an interval within which 
there is a 0.95 probability that Q, 7, will lie. 


2. SUPPLEMENTARY QUESTIONS 


Question 1.9 


(a) A certain market has the following demand and supply functions in 
each period 


DSN 2 = SPAY ae 20 

Seo boa 

D=S 
where D = demand, S = supply, P = price, Y = income and U is a random 
variable. If U = 0 with probability 4 and U = 1 with probability }, and if 
Y = 10, compute the probability that the price exceeds 2 


(i) in 2 periods out of 3, 
(ii) in more than 80 periods out of 100. 


(b) A buffer stock authority is set up to stabilise price within the interval 
2 <P S< 2.2. What is the expected number of interventions by the 
authority in 100 periods? 


(Adapted from University of Essex BA examinations, 1976.) 


Question 1.10 
In the model 
Vp SV sta2x, Ea (tf = 1, 2, 3) 
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the y; are observable random variables, the x; are known to take on the 
fixed values x; = 0, x2 =1,x3 = 2 and the w; are serially independent 
random errors which have the following discrete probability distribution 
for each value of t: 


Ut Probability 
2 


(a) Obtain the sampling distributions of the ordinary least squares 
estimator of the slope coefficient and of the sum of squared residuals 
from the least squares regression. 


(b) Verify that the sum of squared residuals is an unbiased estimator of 
the variance of the disturbances. 


(Adapted from University of Essex MA examinations, 1975.) 


Question 1.11 


The observable random variables y;(t = 1,..., T) and non-random 
quantities x; satisfy the relation 
Ye = Bxe tut eer oy) 


The u, are random disturbances each of which has mean zero and variance 
o* and E(u,u;) = 0 when s #¢. The estimators 6 and b* of 6 are defined 


(a) Show that both b and b* are unbiased estimators of 8, but that the 
variance of b is greater than that of b* except when the x; are all equal in 
which case b = b*. 


(b) Show also that the ratio of the variance of 6 to the variance of b* is an 
increasing function of the ratio of the sample variance of the x; to the 
squared sample mean of the x;. 


(Adapted from University of Auckland B Com. examinations, 1969.) 


Question 1.12 


The observable random variables y; (¢ = 1,..., 7) and non-random 
quantities x;(t = 1,..., T) satisfy the relation 


“yf=atpbytu, (¢=1,..-,T) (Leloery 


8 . EXERCISES IN ECONOMETRICS 


The u, are random disturbances each of which has mean zero and variance 
o? and E(u,u,) = 0 when s #¢. It is known that T = 2n for some integer n 
and the observations {(y;, x4):t =1,..., I} are divided into two groups 
of n with X, and x, denoting the means of the x; in each group and y, 
and ¥, denoting the means of the y, in each group. a is proposed that the 
parameter 6 in (1.12.1) be estimated by 


SS ice aes)! omer) 
(a) Find E(B) and var(f). 
(b) How would you allocate the pairs of observations into two groups in 


order to minimise var (8)? Explain why.your optimum allocation does 
minimise var({). ; 


(Adapted from University of Birmingham B Soc Sc examinations, 1966.) 


Question 1.13 


In an investigation of the holding of stocks of finished output it is 
postulated that the level of stocks, Y;, at the end of any year is a linear 
function of anticipations of future sales, X/*: 


Y, =atpxi tu, (1.13.1) 


where u; is a random disturbance. It is further assumed that X/* is 
determined from actual sales, X;, according to the following adaptive 
expectations mechanism: 


De SX or AC ed) (where 0<A <1) (11322) 
On the basis of this theory a sample of 61 observations on X and Y is 


employed in an ordinary least squares regression which yields the 
following results: 


Y= 73220 t" Ov OG cha 0.6204 ogee = 0.05 0= (tel a.3) 
(51.3) (0.024) (0.005) D.W. = 2.18 


(numbers in parentheses denote standard errors) 


(a) Explain how a relationship of the form estimated may be derived from 
equations (1.13.1) and (1.13.2). 


(b) Do you consider ordinary least squares to be an appropriate estimation 
technique in this problem? 


(c) On the presumption that ordinary least squares is an appropriate 
estimation technique, interpret the information provided in (1.13.3) as to 
the significance of the coefficients and calculate estimates of the 
theoretical parameters a, 6 and A. Comment on the results you obtain. 


(Adapted from University of Essex BA examinations, 1975.) 
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Question 1.14 


In the following model Y is real national income, Y° is real expected 
national income, E£ is real private expenditure, G is real government 
expenditure, M is real money balances and R is the rate of interest. The 
superscript * attached to a variable indicates that it is a planned as 
opposed to an actual or realised value; and a bar over a variable indicates 
that it is autonomously determined (or exogenous). 


E} = A, +kY?—ak;-, 
| cd ara 3 (| mae) H tai 


M; = dM; ed O Memes AA VS ype 
=i 

R, = a(M,—M}) 
? = OYss; sig laced 0D ery 


(a) Examine each of the relations in the above model for a causal 
interpretation. 


(b) Draw an arrow diagram to indicate the causal links between the 
variables and indicate whether the model is recursive. 


(For a model similar to the above but which is not recursive see Laidler, 


1973.) 


3. SOLUTIONS 


Solution 1.1 


Each of the quotations in the question provides a concise view of the 
subject matter of econometrics. Our intention in this solution is to outline 
in more detail the nature of econometrics and the activities that it involves. 
We hope to build on the quotations and analyse the more fundamental 
aspects of econometric model building in our answer. In this way, our 
discussion will remain generally in the area of the question but is not 
intended to be a closely argued response to or dissection of the definitions 
we have quoted in (a), (b) and (c). 

Econometrics is a comparatively young branch of economics. It began 
to develop in a fairly distinct form in the 1930’s when the need for a 
quantitative assessment of the implications of national economic policy 


Pg 
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decisions became most obvious during the world depression. But many 
economists had for long been aware of the shortcomings of economic 
analysis without quantification. One of the most urgent appeals for 
quantitative information on practical issues of economic policy was made 
as early as the beginning of this century by Pigou (1908). But it was not 
until the work of Frisch and Tinbergen in the 1930’s and Haavelmo in the 
early 1940’s that a scientific basis for the quantification of economic 
relations began to appear (see Haavelmo, 1944, in particular). 

Since the use of a scientific approach has played such an important 
role in the development of econometrics it will be helpful to take this as 
the starting point in our discussion. Braithwaite (1968, p.1) states that 

the function of a science . . . is to establish general laws covering the behaviour 
of empirical events or objects with which the science in question is concerned, 
and thereby to enable us to connect together our knowledge of the separately 
known events, and to make reliable predictions of events as yet unknown. 


Popper (1963, p.222) takes a similar view: 


The conscious task before the scientist is always the solution of a problem 

through the construction of a theory which solves the problem; for example, by 

explaining unexpected and unexplained observations. 
The scientific method consists, first, of formulating a theory or an 
‘axlomatized deductive system’ (Popper, 1963, p.221) — that is, a set of 
hypotheses arranged in order with ‘the hypotheses at the highest level 
being those which occur only as premises in the system (and) those at the 
lowest level being those which occur as conclusions in the system’ 
(Braithwaite, 1968, p.12). These conclusions are sometimes known as the 
predictions of the theory. The theory itself, including its predictions, will 
have an overall structure resting on axioms and logical reasoning similar 
to that which characterises the deductive method of pure mathematics 
and logic. When a theory has been conceived and its logical implications 
explored, we may wish to examine the ability of the theory to explain 
observed phenomena. To do so we need to record the predictions of a 
theory and to compare these with the observed facts which the theory 
purports to explain. 

To proceed in this way in economics, we need numerical observations 
of the economic quantities in which we are interested and a procedure 
for measuring the relationships between the various economic quantities 
that are implied by an underlying economic theory. We must also take 
account of the fact that a theory and its predictions cannot normally be 
accepted or rejected with complete confidence (i.e. with a probability of 
unity) on a given set of data; there is almost always a positive probability 
of an incorrect decision (or inference) being made. Hence we need rules 
for testing the validity of the relationships suggested by the theory on the 
strength of their correspondence with the actual observations. It is with 
these various procedures that the subject of econometrics is concerned. 

To take this discussion any further, it will be helpful to consider the 
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very useful set of guidelines concerning the subject matter of 
econometrics which were laid out in Haavelmo (1944) and later by 
Bergstrom (1966). Haavelmo and Bergstrom found it convenient to speak 
of econometrics in terms of the activities it includes. These are: 


(a) the formulation of econometric models; 

(b) the estimation and statistical testing of these models with observed 
data; and 

(c) the use of these models for prediction and policy purposes. 


To clarify the concept of an econometric model we need to recall the 
meaning of the term ‘model’ itself. One definition of a model is ‘a 
simplified representation of a real-world process’. Popper (1959, p.142) 
believes that a model should be as simple as possible: 

Simple statements, if knowledge is our object, are to be prized more highly 


than less simple ones because they tell us more; because their empirical content 
is greater; and because they are better testable. 


Friedman (1953, p.14) also expresses this view: 


A hypothesis is important if it ‘explains’ much by little, that is, if it abstracts 
the common and crucial elements from the mass of complex and detailed 
circumstances surrounding the phenomena to be explained and permits valid 
predictions on the basis of them alone. 

The choice of a simple model to explain complex phenomena in the 
real world may lead to the criticisms that the model is oversimplified and 
that the assumptions that underlie it are unrealistic. Such criticisms are 
often made of the models which are developed in economic theory and 
which we call economic models. Koopmans (1957) argues that the use of 
models in economics can be defended against such criticisms if we look 
upon economic theory as a sequence of models 

. .. that seek to express in simplified form different aspects of an always more 
complicated reality. At first, these aspects are formalised as much as feasible in 
isolation, then in combinations of increasing realism. . . . The study of simpler 
models is protected from the reproach of unreality by the consideration that 
these models may be prototypes of more complicated subsequent models. The 
card file of successfully completed pieces of reasoning represented by these 
models can then be looked upon as the logical core of economics, as the 
depository of available economic theory. (pp.142—143) 
The premises on which each member of this sequence of models rest 
involve approximations to reality and often comprise what seem to be 
rather crude simplifications of the objectives behind the behaviour of 
various economic agents such as consumers and producers in an actual 
economy. But the models themselves 
. .. exhibit in a striking manner the power of deductive reasoning in drawing 
conclusions which, to the extent one accepts their premises [our italics, P—W], 
are highly relevant to questions of economic policy. In many cases the 
knowledge these deductions yield is the best we have, either because better 
approximations have not been secured at the level of the premises, or because 
comparable reasoning from premises recognised as more realistic has not been 
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completed or has not yet been found possible. (Koopmans, 1957, p.142). 
Friedman (1953) argues that 


... the relevant question to ask about the assumptions of a theory is not 
whether they are descriptively ‘realistic’, for they never are, but whether they 
are sufficiently good approximations for the purpose in hand. And this question 
can be answered only by seeing whether the theory works, which means whether 
it yields sufficiently accurate predictions. (p.15) 
A closely related view has recently been put forward by Fair (1974) who 
looks on 
. ..a theoretical model ...as not so much true or false as useful or not useful. 
The model is useful if it aids in the specification of empirical relationships that 
one would not normally have thought of from a simpler model and that are in 
turn confirmed by the data. (p.16) . 

It is with the ‘specification of empirical relationships’ that the primary 
task of econometrics is concerned (activity (a) above). In this activity, the 
econometrician can be guided by the models which have been developed, 
in abstract, in economic theory within the particular area under study as 
well as the evidence that may have accumulated from previous empirical 
studies in the same area. He must then select the economic ideas that seem 
appropriate to the phenomena in question and reframe the economic 
model which embodies these ideas into a form sufficiently precise to be 
estimated from a series of observations on the relevant variables. Once the 
model is in this form it becomes known as an econometric model. 

To clarify the decisions that need to be made in the passage to an 
econometric model we note that: (a) the relationships in an economic 
model often take a form which is too general for statistical fitting; and 
(b) economic models normally, but not necessarily, exclude random 
elements in behaviour. In formulating an econometric model, therefore, 
it is usually necessary to decide which variables should be included 
explicitly in the model (often this will depend on the statistical data that 
are available) and what functional form the relationship between these 
variables should take. The formulation of an econometric model also 
involves the introduction of random disturbances to allow for random 
elements in behaviour that are not accounted for in the underlying theory 
and to allow for errors resulting from the omission of variables (whose 
individual effects are thought to be unimportant) and from the possible 
mis-specification of the functional form of the relationships; in addition, 
the model may involve random errors of measurement in the observable 
economic variables to account for the inaccuracies that may be present in 
the statistical data to be used. We will discuss some of these points further 
in solution 1.2. 

Bergstrom (1966 and 1967) has provided useful definitions of the terms 
‘economic model’ and ‘econometric model’ which express the above ideas 
in a precise way. With some adaptation his definitions are as follows: 


(a) An economic model is any set of assumptions and relationships 
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which approximately describe the behaviour of an economy or a sector 
of an economy. 

(b) An econometric model is normally composed of two parts: first, a 
system of equations relating observable economic variables and 
unobservable random variables called disturbances (representing the 
outcome of political, social and other events not directly incorporated 
into the equations themselves) and measurement errors (representing 
errors of measurement in the observations of the economic variables); 
second, a set of assumptions about the stochastic properties of the 
random variables (including perhaps — but not necessarily — their 
probability distribution). 


The subject of estimation and statistical testing which we have given 
as the second activity in econometrics (see (b) above) suggests a close link 
with the subject of mathematical statistics. It is, indeed, true that many of 
the statistical methods in use in econometrics draw heavily on procedures 
which have been developed in the literature of mathematical statistics and 
applied in the natural sciences. But there are important differences. In the 
natural sciences the data are often, but not always (c.f. astronomy), 
subject to experimental control so that an experiment can, for instance, 
be designed to highlight the particular effects in which an investigator is 
interested; and the number of sample observations in which an experiment 
can often be readily increased (although this is sometimes an expensive 
process), making established procedures of statistical inference that are 
based on large samples more reliable. But these conditions frequently do 
not apply to economic data. An econometrician often has to deal with 
only a small quantity of data, some of which may well contain large 
inaccuracies or errors of measurement. The data are rarely subject to 
experimental control and almost always embody non-economic 
influences. Moreover, the models that are used in econometrics, typically 
involve a priori parameter restrictions (reflecting the information derived 
from economic theory which we wish to utilise in the specification of the 
model) and these restrictions often lead us to rather complicated systems 
of equations which can best be handled by statistical procedures rather 
different from those in use in the natural sciences. 

Econometric model building is not without its critics. Serious doubts 
are sometimes expressed about the particular specification of an 
econometric model on the grounds that it is not an adequate empirical 
representation of the underlying economic model. It may be thought, for 
example, that the use of a linear econometric model, or the assumption 
that the econometric model has a time invariant functional form 
throughout the sample period, are poor approximations. Another criticism 
often voiced is that an econometric model is not compatible with the type 
of data to be used. An econometric model based on an economic model of 
the individual economic agent or commodity may not be suitable for use 
with data aggregated over individuals or commodities. Problems may also 
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occur if the data are aggregated over a period of time longer than that to 
which the economic model applies. 

Even when an econometric model is found to be compatible with the 
data on the basis of conventional (non-predictive) statistical tests, there 
may still be problems. The model may, for example, be found to be 
compatible with several different (and competing) underlying economic 
hypotheses. Alternatively, the data may afford support for different 
econometric models. Often an econometric model is found to perform 
well on one set of data but to perform badly on another set (for instance, 
by poor forecasting). This is frequently important evidence of the 
inadequacy of the specification of the econometric model. Predictive tests 
such as this should be part of the standard procedures. 

The adequacy of conventional predictive and non-predictive statistical 
tests and the suitability of the method of estimating the econometric 
model may depend critically upon the (often implicitly assumed) 
stochastic properties of the disturbances of the model — see the second 
part of Bergstrom’s definition of an econometric model given above. The 
failure of the disturbances to satisfy certain assumptions (such as serial 
independence) can invalidate the tests and lead to incorrect inferences. It 
can also mean that the estimation technique produces estimates with 
undesirable statistical properties. 

The recent study by Granger and Newbold (1974) reinforces much of 
the textbook advice on this point. To help avoid these problems, tests 
should be performed of these assumptions. 

An interesting challenge to econometric model building has recently 
been issued by those who prefer to use for forecasting purposes time 
series modelling techniques that do not necessarily link with econometric 
theory, rather than econometric models. In this connection the results of 
Cooper (1972), Nelson (1972), Granger and Newbold (1974) and Christ 
(1975) are of special interest. The time series modelling approach is to fit 
simple parametric models (called ARIMA models — see question 7.16) to 
univariate and, more recently, to multivariate economic time series and 
then to use these models for prediction. In some cases this approach has 
been successful, but not so in others (see Chatfield and Prothero, 1973). 
The significance of this challenge for econometricians is not so much that 
this type of time series modelling is an alternative to econometric model 
building but rather that it should focus more attention in econometrics 
on improving the specification of both systems and error dynamics. For a 
useful reconciliation of the two approaches, see Wallis (1977). 

These considerations do not make the task of an econometrician any 
easier but should sharpen awareness of the dangers of over-reaching in 
conclusions and encourage acceptance of the view that a model is an 
approximation which, at best, will prove useful in explaining observed 
data. If it is useful in this way, then the results provide an important 
feedback and stimulus to the process of guiding the development of more 
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realistic models. This refers not only to the underlying economic 
hypotheses but also the statistical assumptions underlying the 
construction of the econometric model. 

Each of the definitions in the question embody the two aspects of the 
scientific method we have discussed: first, the a priori reasoning required 
in the development of economic theory as a deductive system and leading 
to the ‘laws’ that underlie the construction of an econometric model; and 
second, the testing of the predictions of such theory by statistical 
inference using empirical evidence. Thus the econometrician frequently 
finds that there are two rather different jobs of work to be done: assisting 
the theorist in the refinement of models and designing tools of statistical 
inference appropriate to the models to be used. In Malinvaud’s words: 

The art of the econometrician consists as much in defining a good model as in 
finding an efficient statistical procedure. Indeed, this is why he cannot be 
purely a statistician, but must have a solid grounding in economics. Only if this 
is so, will he be aware of the mass of accumulated knowledge which relates to 


the particular question under study and must find expression in the model. 
Malinvaud (1970b, p.723). 


Solution 1.2 


Part (a). Substituting equation (1.2.3) into (1.2.1) and (1.2.2) we obtain 
the ‘structural’ form of model: 


dt = A ~ Qype F Aye + Uye (1.2.4) 


Gt = Bo + Bipe + Bot + ure (12.5) 


The structural equations (1.2.4) and (1.2.5) embody the a priort 
knowledge derived from economic theory. Thus (1.2.4) is a demand 
function and (1.2.5) is a supply function. The decision of which variables 
to include and which to exclude from these equations is made in the light 
of economic theory. These two equations describe the behaviour of the 
market. They indicate that quantity and price are determined 
simultaneously within the market. For this reason g and p are called 
endogenous variables. On the other hand, y and ¢ are not determined 
within the market but outside it. They are called exogenous variables. 
The disturbances uj, and wu», shift the demand and supply functions 
randomly. They are called structural disturbances or structural errors. 
They represent the combined effect of all omissions and errors. In practice 
it is inevitable that an econometric model such as (1.2.4)—(1.2.5) will be 
an oversimplification of the ideas that underlie the economic model. 
Common reasons are that the wrong functional form is chosen or that 
variables have been omitted or again that the variables included are 
measured with error. All of these errors are captured by random 
disturbance terms. Frequently we appeal to the central limit theorem to 
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argue that the disturbances are normally distributed. In principle no single 
omitted variable or functional error should be so large as to dominate the 
disturbance term. If it does, then the central limit theorem will not be 
applicable even approximately. In this case, the variable omitted, for 
example, should be included explicitly in the structural equation. Of 
course, the normality of the disturbance term cannot be taken to be either 
a necessary or a sufficient condition for the satisfactory specification of 
an equation. 

If we want to determine the total effect on an endogenous variable of a 
change in an exogenous variable we use the reduced form of the model. 
The reduced form expresses each endogenous variable as a function of the 
exogenous variables and the structural disturbances. Solving (1.2.4) and 
(1.2.5) for g and p as a function of y, ¢, a constant and a disturbance, 
we obtain the reduced forms 


: = (208 +S) ( OX By | +| a; Bo Jes [Poza tei) 
: a, + By a, + By A a, + By a, + By 


and (1.2.6) 


hs Bee by ( Oy ail B, bs Sa 

a e + By a, +B, hs a, + B; a, + B, eh) 
The coefficients of the exogenous variables in the reduced form can be 
interpreted as (impact) multipliers. Thus 0q;/dy; = @, 8; /(@,; + B,) and 
Op; /Ov_ = &, /(a, + B,). It should be noted that 09;/dy; obtained from 
either (1.2.4) or (1.2.5) measures the partial response of g to changes in y 
with p, the other exogenous variables and the disturbances held fixed. The 
term 0q;/0y, obtained from the reduced form measures the total change 
in g; for a unit change in y,; with the remaining exogenous variables and 
the disturbances held fixed, but with p; being allowed to change. Using 
the demand function (1.2.4), the relationship between the two derivatives 


is 
(09:/Oye)Rr = (09¢/0Pt)p (OP t/Oyt) Re + (09+/9¥t)D 
= (— a) [0 /(oy + B, )] +a = a8, /(@, +B,) (1.2.8) 
Using the supply function we obtain 
(092/Oye)re = (09¢/0Pt)s (OPe/OVt)RE + (992/Iyt)s 
= (By) [2 /(a1 + B1)] + 0 = a8, /(a, + By) (1.2.9) 


The suffixes in (1.2.8) and (1.2.9) denote the equations from which the 
derivative is obtained. 

In figure 1.2 the above result is shown diagrammatically. § denotes the 
supply function (1.2.5), and Dp denotes the demand function (1.2.4) 
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Figure 1.1 


before y,; is changed and D, is the demand function after y, is increased 
by, say, one unit. The distance g,; — dg = Q, shows the change in demand 
at the original price fy. The distance g, — qz is the reduction in demand 
due to the rise in price from pg to p,, namely Q; (Pp; — fo) = A, &/(a, + 
6, ). The distance ¢2 — Go is the total effect on g of a change in y, namely 


GH Goa (4 edo) Aa 92) = 9 4 Ss (8; ) 
= a8, /(a, + B,). 


The reduced form is clearly the appropriate model to use for forecasting 
the value of an endogenous variable for given values of the exogenous 
variables and the disturbances, as it measures the total effect on the 
endogenous variable of each of these variables. Since the endogenous 
variables are random variables we normally predict their expected value. 
The expected value of q; is obtained by taking the mathematical 
expectation of the reduced form equation for q, (1.2.6): 


_ [%oBi + %1 Bo O By 1 By 
Bays | a, +B, dale wale : E ale 


A (cna (1.2.10) 
Qty + B, 
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Part (b). Although a tax rate does not appear in the original model, using 
economic theory we are able to introduce one without invalidating any 
evidence that may have been acquired on the original model, such as 
estimates of the coefficients of the model based on data prior to the 
introduction of the tax. This is a major advantage of econometric models 
over conventional statistical models which typically are not specified using 
economic theory. Introducing the tax affects only the supply function 
(1.2.5) which becomes 


gt = Bo + Bi (pe —7e) + Bot + ur. (122.11) 


The reduced form equations arenow  \ 


fe [P28 | QB, ] +| a; B. | a |, 
a Qa, + By ay + By ms a, + By a, + By : 


By uit aa ‘ 
SI fe ee ae ee | Pal 
( a, + By 
a a0 fs) 2) | B, By ] 
=) (ep ae peste et ———s 
eh bee On By < a, + By sides a 
Uit Uae 
+ (=e fay | (T2535) 


Substituting into (1.2.12) and (1.2.13) the values of the coefficients and 
the expected values of u,, and ux, we obtain 


E(q:) = 5 + 0.05y, + 0.5t — 0.257, (1.2.14) 
E(p:) 


In period one, (1.2.14) and (1.2.15) become 


LOO Ly a et Ore (1.2215) 


E(q,) = 5 + 0.05(100) + 0.5(1) — 0.25(1) = 10.25 
E(p) = 10 + 0.1(100) — (1) + 0.5(1) = 19.5, 


our required results. 
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Part (c). In order to find the value of 7; for any target value of E(p;), we 
solve (1.2.15) for 7; giving 


Ge ee CEA) Oz, + ote (1.2.16) 
In period one and for a target value of E(p, ) = 20, the required tax rate is 
Mess ew 220) = 0.2(100) 2 = 9: 


From (1.2.15) 
E(q1) = 5 + 0.05(100) + 0.5(1) — 0.25(2) = 10. 


In general, the appropriate model for fixed or flexible target policy 
analysis or for optimal control is the reduced form (see Peston, 1974 and 
Chow, 1975). 


Solution 1.3 


Part (a). Aggregating (1.3.1) over all n households we obtain the aggregate 
consumption function 


n n n n 

» Cy iS 3s ro aw y Bivit + ss Uit (t = Ly Riese T) (2373) 

i=1 i=1 i=1 i=1 
or, using the aggregate variables ¢,; and y;, 

G& = a+ By +H (1.3.4) 
where & = D7 =; G, 

B=» Bivie] Sve =) BX; (1.3.5) 

i=1 i=1 i=1 


i, = Loa uj, E(u.) = 0, E(u?) = Vf, 0? and E(u,u,) = 0 for all t #s. 

(i) If Xj, is non-random then f;, the ‘macro’ marginal propensity to 
consume, is also non-random. f; is a weighted average of the time invariant 
individual marginal propensities to consume §;. As the weights Aj; are 
dependent on time, f; is also time dependent. The aggregate consumption 
function, equation (1.3.4), is, therefore, an example of a model with a 
time varying parameter. Only if A; is independent of time is 6, a constant 
and (1.3.4) a conventional linear model with constant coefficients and a 
disturbance term with zero mean and constant variance. 

(ii) If Ay, = Ay + €i¢ with A; non-random then 


Be 


2 BA + > Bie 


where 6* is a constant and €; is a random variable with E(e? ) = 0, E(er?) 
o é 
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= w? D1, B? and E(efe*) = 0 for all t #s. Thus B, is a serially 
independent random variable with mean £* and variance OOPS 
Equation (1.3.4) can now be interpreted as a model with a random 
coefficient. Substituting (1.3.6) into (1.3.4) we obtain 

& = atB yi, +u; (L250) 


<a 
where Ut > ut a er Vt E(u; ) = 0, E(uz?) on Liz 0; ot «2? (7-187) 9, 
E(uj uz) = 0 for all t #s and E(ufy,) = 0. The variance of the disturbance 
term in (1.3.7) is dependent on time and, therefore, heteroskedastic. 


Part (b). (i) In this case 6, the regression coefficient of ¢; on ¥z, is 


Ts 
ee crn ye 
t 


Des Te 
= 5? + t50)/ S92 
t= = 
Since y; is, by assumption, non-random 
spat T ga 
E(b) = BEI” Bho (1.3.8) 


The variates of b is 


ee 


When 8, is independent of time and equals B, say, then E(b) = B but the 
variance of 6 is still given by (1.3.9). 
(ii) In this case 


T Ts 
b= B+) uty, | 057. 
stl t=1 


Hence E(b) = 6", that is b is an unbiased estimator of B*, and the variance 
of b is 
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E(b — B*)? 


ll 
by 
— 
Mar 
= 
+ HK 
<! 
i 
Ms 
S41 
bad Sod 
SS 
i] 


ll 
Ms 
+ % 
N 
S 
bad Se) 
——— 
Max 
2] 
hd 
Se 
i) 
+ 
M 
M 
a 
2 
+ 
2 
nx 
Si 
Sy) 
an 
= 
Ms 
) 
+N 
ee 
Nn 


= ($e E5i| + [os (Ee7)( Z 8)/(Z 32) |. 


As the variance of u; is not constant for all t, b is not an efficient 
estimator of B”. We would need to use generalised least squares in order to 
obtain an efficient estimator (see question 2.12). 


Solution 1.4 


Part (a). We start our solution with a brief discussion of the meaning of a 
causal relationship. It would be helpful in this respect if we could provide 
an unambiguous statement about the fundamental concept of causality. 
While attempts to discuss causality in a very general way have led to much 
controversy in philosophy, it can still be argued that the concept of 
causality is of great operational significance in scientific work, particularly 
in the area of experimental science. In economics the importance of the 
notion of causality in the construction and use of models has often been 
emphasized, particularly in the extensive writings of Herman Wold. 
Fortunately, it seems possible to define more narrowly a concept of 
causality which is adequate for model building situations in economics. 
Both Wold (1954) and Simon (1953) have done this and the reader is 
strongly recommended to consult their articles. Both authors stress that 
causality is a theoretical concept which must be interpreted in the context 
of a formal theoretical model. As we have seen in solution 1.1, models in 
economics involve a set of assumptions and relationships among economic 
variables which are often expressed in mathematical form as, for instance, 


ny Rae ere tas Nera) (1.4.6) 
Wold then defines this relationship to be causal if the variables 
X1,+++ ,X, would be regarded as the ‘cause’ variables and y the ‘effect’ 


variable in a (fictitious) controlled experiment. In other words, if the 
relationship (1.4.6) were being investigated by a controlled experiment, 
the relationship would be said to be causal if the variables x,,...,Xn 
were under the experimenter’s control and the variable y measured the 
observed effect of the experiment. Of course, controlled experimentation 
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in economics is virtually impossible so this way of regarding (1.4.6) must 
remain somewhat abstract. 

It is also possible to think of (1.4.6) as a directed functional 
relationship. That is, according to (1.4.6) the variables y, x,,...,X, are 
functionally related and, in addition, the direction of the relationship 
from x,,...,Xn toy is an essential element of the specification. 
Otherwise, we might well have written, instead of (1.4.6), 


Xj SECU Vs Kiy ee og Dims MGs te hee) (1.4.7) 


with the different function g(_ ), and x; now appearing as a dependent 
variable [under certain conditions on the function f(_) we can write 
(1.4.6) in the alternative form (1.4.7)]. If (1.4.6) and (1.4.7) are treated 
as equivalent specifications in a theoretical model then that model 
attaches a symmetry to the roles of y and the x; in the relationship 
(1.4.6). It is this very symmetry which is absent from a causal relationship 
and indeed the ‘common sense’ notion of causality. In a causal 
relationship between a number of variables, direction is an essential 
component which must be clarified at an early stage in the construction of 
the model. When the direction of the relationship is specified then the 
relationship is asymmetrical in much the same way as common sense 
examples of causality are asymmetrical. We give the following two 
examples: 


pot placed over fire > increase in temperature of water in pot 
and 
earthquake > burst dam > valley floods. 


Each of these examples involves a chain of causation. The arrows are an 
essential part of the reasoning and make the overall statements 
asymmetric (if we reverse the arrows the statements no longer square with 
common sense!). We can, in a similar way, build up the concept of 
causality based on the ae of an asymmetrical functional relationship and 
Simon (1953) provides a careful treatment of the subject along these lines. 
The idea of a chain of causation is very useful when we come to analyse 
the underlying nature of an economic model. It has been systematically 
applied by Herman Wold in the development of recursive (or causal chain) 
systems. A recursive model is one which displays the following features 


(Wold, 1954, p.172): 


(i) The model refers to a sequence of years, months or other time units. 

(ii) All relations of the model are causal with two types of variables: 
endogenous which it is the purpose of the model to explain and exogenous, 
which are auxiliary. In every relation of the model the effect variable is thus 
endogenous, while the cause variables are either endogenous or exogenous. 

(iii) The model has one, and only one, causal relation for each endogenous 
variable. 

(iv) Given the development of the exogenous variables and a set of initial values 
for the endogenous variables, the model allows us to calculate, EEE 
the development of the endogenous variables. 
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The nature of the chain of causation in a recursive system can be 
illustrated by the use of arrow diagrams such as the following: 


Recursive model Arrow diagram 
amt t heal 
Y, = £, +G, Y en Seg RT sees 
E, = aY,4 +6 E a ! ! 
G, = G, (autonomous) G we i ia 


This model is a simple form of the model given in the question and the 
variables are as they are defined there. The arrow diagram indicates the 
causal connections between the variables and the lags that operate in the 
various relationships. We note that all relations between two variables are 
unilateral and that all arrows between variables in the same time period 
are in the same direction. This is consistent with our notion of directed 
functional relationships. It is easy to see that the model satisfies Wold’s 
criteria (i)—(iv) above and it is, indeed, a recursive system. 

Arrow diagrams such as the one we have just employed were used by 
Tinbergen and Wold in the development of models and the analysis of 
their underlying features. They were also systematically used in a 
searching article by Bentzel and Hansen (1954) on the basis for recursive 
and interdependent systems in economics. In an interdependent system 
the arrow diagram has quite different features from those that apply in 
recursive systems. We need only illustrate with the following model, 
closely related to the one we have just considered: 


Interdependent model Arrow diagram 
tet t fester 


Vo Ee G; Y "PN, 


E, = aY,+0Y,-, +e E 


Cr= GC, (fe 


We notice in the arrow diagram of this model that the arrows between any 
two variables are not always unidirectional. Between Y and E in any one 
time period we find two arrows and hence two relations. This means, for 
example, that E is at once both a cause and an effect of Y. 

Consequently, this model cannot be adequately described by a chain of 
causation. We say that Y and E are jointly and simultaneously determined 
in this system and the model is described as interdependent. The fact that 
interdependent models do not embody consistent chains of causation is 
not in itself a very powerful criticism. These models are best regarded as 


f é 
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approximations to the underlying process by which the variables are 
determined; and, in this process, the time lag between the change in one 
variable and its influence on another may be sufficiently small (relative to 
the time unit we are using in the model) for interdependence to be a 
suitable approximation. There are many other reasons which justify the 
use of interdependent systems and the reader is urged to consult Bentzel 
and Hansen (1954) for a detailed discussion. We leave the final word in 
our general discussion to them: 
On a very abstract level of economic theory there seems to be a very strong case 
for the recursive type of model. In a certain sense it may be said to be a 
fundamental type of model in economic theory. Theory on this level is, 
however, so abstract that it is hardly fit for empirical testing. When we leave the 
realm of abstract theory and — through elimination of variables, aggregation and 
other simplifications — move into the realm of more ‘realistic’ workable models, 
interdependency may come into the model through several ways. Therefore, on 
this ‘realistic’ level of economic theory interdependency does not seem to be an 
unnatural property of economic models. (Bentzel and Hansen, 1954, p.154.) 


Part (b). We notice that the model given by (1.4.1—5) explicitly involves 
expectations. We assume that these expectations are formed by private 
individuals in the economy to which the model refers and we observe that 
the formation of expectations about the values of variables in a certain 
time period is logically prior to the realisation of the actual values of the 
variables in that time period. Moreover, economic agents (consumers and 
producers) will often form plans on the basis of such expectations and 
these plans too are logically prior to the realisations. These considerations 
suggest that it is possible to develop a causal chain of expectations 
through plans through realisations through to expectations for the next 
period and represent this chain by an arrow diagram as we did in part (a) 
(this was first done by Bentzel and Hansen, 1954, p.156). We have 


7 ee eo i eo 
Expectations a Bs ai 
i J 4 | : 
: ae 
Realisations a r a 


In the model (1.4.1—5) we see that there are no relationships which 
describe the formation of plans on the basis of expectations. But 
expectations of income are determined via (1.4.1) on the basis of the 
realised value of income in the last period and expectations of expenditure 
are formed via (1.4.2) on the basis of income expectations. Expenditure 
expectations are, in fact, then realised in (1.4.3) and combine with 
autonomous government expenditures to determine thé realised level of 
income in that period. The arrow diagram for this model is as follows: 
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t Bat ta2 
Expectations y B wi it 
E / { a 
Realisations G = fm i 
E = :) & 
WA { ty a 


As the arrow schemes make clear this model is, indeed, recursive. The 
same principle can be applied in more complicated models that involve the 
formation of plans as well as expectations and for models with many more 
variables. For a further exercise we refer the reader to supplementary 
question 1.13. 


Remark. Our discussion has concentrated on the nature of recursiveness in 
the context of economic models. This seems appropriate because it is in 
the initial formulation of models that the notation of causality must play 
a primary role. However, recursiveness has important implications in the 
context of econometric models also and we find that a probabilistic model 
is recursive only when the random disturbances (that now enter the 
specification of the model) satisfy certain restrictive conditions. But when 
these restrictions are satisfied the statistical procedures that are 
appropriate in the model are greatly simplified. The reader is referred to 
the excellent discussion in Malinvaud (1970b, pp. 612—614 and pp. 679— 
681) for further details. 


Solution 1.5 


Part (a). From the model (1.5.1) and the given values of x, we see that 
Va oy abe 
Yo = 6+ u, 


2 
Lye = 9t (uy +42) 
t= 


2 
YL xe = (8 + ur) + 2(6 + uz) = 15 + (uy + 2u2) 
t=1 


Now Z?zix+ = 3 and D?-,x? = 5 so that from the definitions (1.5.2) we 
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ai) = 3 + 4 (uy + u2) (1:25,3) 


(SS Bhar k (uy ot Quy). (1.5.4) 


‘\ 
The sampling distributions of a; and a, can now be seen to depend 
(i) on the form of (1.5.3) and (1.5.4), and 
(ii) on the probability distributions of u, and u2. 
Taking a, first, we can tabulate the values a, will take according to the 
values taken by the pair of random errors (u,, uz). We have 


ay (uy, U2) ; 
3% (leer) 

3 (1,—1) 

3 (—1, 1) 

24 (aig) 


The sampling distribution of a, is then obtained simply by associating a 
probability with each value taken by the pair (w,, uz) in the above table. 
Now wu, and uw, are statistically independent, so that 


P(u, =1,u, =1) =P(u, = 1) P(u, =1)=4 x4 =4 


P(u,y =1,u. =—1) =P(u, =1) P(u, =—1)=4x 4 =5 
Pu leah) SP pS SP aA) = oe 
Rte Lats ae 1) BL) Ba | ee 


Hence the sampling distribution of a is as follows: 


ay Probability 


34 k 

3 t 

25 : 

In a similar way we obtain for a, the tabulated values 

a2 (wy Pm ) ) 

35 (1, 1) 

2% (eek) 

3% (= ial) 


re (aly) 


and, hence, the sampling distribution 
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ar Probability 


38 4 
3% 4 
2% 4 
2% 4 


From these sampling distributions for a, and a, we can readily obtain 
the expected values and variances. We have 


E(a,) = (35)4 + (3)$ + (25)4 = 3 

E(a,) = (38)4 + (34)4 + (23)4 + (28)4 =3 
var(a,) = Ela, — E(a,)]* = (§)?4 + (—3$)*4 
Wate (eS) decd e) ate) ek (2) k= 


It follows immediately that var(a; ) > var(a, ) as eae 


Il 


Part (b). We can deduce the new sampling distribution of a, and a, in this 
case directly from (1.5.3), (1.5.4) and the given joint probability 
distribution of (uw, , wz). We have 


ay Probability 


33 10 
3 16 
25 10 


and 


ay Probability 


33 i 
3h 5 
23 i 
28 fs 


We now obtain for the expected values and variances of a, and a): 
E(a,) = (33)¥5 + (3)f6 + (25)%0 = 
E(a,) = (3%) + (38)i6 + (23)i0 + (28)10 = 3 

E[a, — E(a,)]? = (3)? 6 + (3)? #5 = 96 


II 


var(a, ) 
and 
var(a,) = E[a, — Kay) | 
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We now see that 
var(a,;) = ¢ = 0.0889 < 0.104 = #8 = var(a, ) 


as required. 


Remark 1. We note from the definitions of a, and az in (1.5.2) that, 
since x, and x are fixed, both a, and a, can be regarded as linear 
combinations of the observable random variables y; and y,. We therefore 
refer to a, anda, as linear estimators. Moreover, we see that a, is the 
ordinary least squares estimator of the slope coefficient. 


Remark 2. In part (a) where u, and wu,’ are statistically independent the 
model (1.5.1) satisfies the assumptions of the classical linear regression 
model (see Goldberger, 1964, p.162) and, as a result, the least squares 
estimator a, has minimum variance in the class of all linear unbiased 
estimators (Goldberger, 1964, p.208 and Johnston, 1972, p.22). The fact 
that var(a, ) > var(az ) as we have seen in part (a) is just an example of 
this general result. In part (b), az is no longer the linear unbiased 
estimator which has minimum variance and as we see there the estimator 
a, now has a smaller variance. This is because uw, and wy are correlated 
[the reader is asked to check for himself that E(u, u.) #0] and so the 
model no longer satisfies the assumptions of the classical linear 
regression model. In the present case, the linear unbiased estimator with 


minimum variance is obtained by the use of generalised least squares 
(Johnston, 1972, p.208—210 and Goldberger, 1964, p.233). 


Solution 1.6 


Part (a). From equations (1.6.1) and (1.6.2), and using the fact that 
x, =Oandx, =1 we see that 


Ve = Uys “Yo = Law, “and xX, =o 
It now follows from the definitions in (1.6.3) that 
Selaaetl jak th 1 hittasravy Deda tha 
a geg. 2 ae ae mae ae Se ees 
(1.6.4) 
We can tabulate the values taken by the estimators in (1.6.4) according to 


the values assumed by the triple (uw; , u2,v) in much the same way as we 
did in solution 1.5. We obtain table 1.6.1. 


ay 
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Table 1.6.1 
a a2 a3 (wy, U2, v) Probability 

3/(1 + €) (2+ €)/ (1 + €° ) 1) <) teeta ge 1/8 
3/(1 —e) (2—e)/ (1 + €° ) 1/(1 + €) ihe Tie Seen 1/8 
1/(1 + €) e/(1 + € ) —1/(1—e) Lehi eke 1/8 
i/(tse) -—e/(1 + €”) =Tiiie1e) L =1 (Se 1/8 
1/(1 + €) (2 —e)/(1 + €?) 3/(1 —e) oak ea ets 1/8 
1/(1 —e) (2 —e)/(1 + e?) 3/(1 + €) =) 1 +e 1/8 

=E/tist ec) —e/(1 + €7) 1/(1 —e) lee eeere 1/8 

—I/(1 =<) e/(1 + €?) 1/(1 + €) lie 1 ee 1/8 


The probabilities in the final column of table 1.6.1 are obtained by 
noting that u,, uz andv are statistically independent so that, for instance, 


P(uy = 1, ug = 1,0 = €) = P(u, = 1)P(u2 = 1)P(v =e) = (4)? = 


We can extract from table 1.6.1 the following sampling distributions for 
a,;, 4, andaz (tables 1.6.2—4). 


Table 1.6.2 Table 1.6.3 Table 1.6.4 
a Probability a2 Probability a3 Probability 
Sila €) 1/8 (2 + e)/(1 + €*) 1/4 Silene) 1/8 
3/(1 + €) 1/8 (2 — e)/(1 + €?) 1/4 3/(1 + e) 1/8 
its 1/4 ei lctie*) 1/4 Ui eae) 1/4 
1/(1 + €) 1/4 —e/(1 + e7) 1/4 1/(1 + €) 1/4 
—1/(1 + €) 1/8 —1/(1 + €) 1/8 
ag) =e} 1/8 =i (le) 1/8 


Part (b). We are asked to find P(0.75 <a; << 1.25) forz = 1, 2, 3. We must 
first find the values of a; (for each 2) which lie inside the interval [0.75, 
1.25]. From tables 1.6.2—4, we see that the values assumed by the a; 
depend on €, and € by assumption satisfies the inequality 0<e<§. 
There is a corresponding inequality for each value taken by the a;. To find 
these inequalities we must first carry out some simple manipulations. 

We start with a, . Since 0 <e€ <% we deduce that 


4 5 
1>1-—e>— sothat 1< <= 
5 jane UE 
6 1 5 
1<1 4'e=<—  sothat 1 >a SS 
5 bare Ve 


and it follows also that é 


3 15 3 
Se <i vand: 3 > Se 
| eet 2 4 Lick Baan 


From these inequalities we now have 


‘ 
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P(0.75 <a, <1.25) 
= Pla 11 —e)] + Play = Wey] 


=$+h=4 
and 
P(0 Sa; 2) \ 
= Flag l(lse)) se Glee Ghteyl 


Tuming to the sampling distribution of a,, we note that 0<e <3 


implies that 
26 
Lat tere 
Zo 


\ 


so that 
25 1 
aes <I, 
OG) lecies 
Moreover 


1] ~ 
eg OS Ts and ee 


(1.6.5) 


(1.6.6) 


Using the rule that ifa<b,c <d anda, b,c, d are positive then ac < bd 


we deduce the following inequalities from (1.6.5) and (1.6.6) 
ie ae Selah 
oe kt Ge 15 


and 


We can sharpen the first inequality in (1.6.7) by noting that 
2e? —e = e€(2€— 1) 
is negative when 0 <e <#. Thus 


Je — =< 0 
and, adding 2 + € to both sides, we have 
2+ 26? <<2+€e 
or 
2+€ 
<a 
A 1 + e? : 


Combining this last inequality with (1.6.7) we obtain 


(1.6.7) 


(1.6.8) 
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eee 
oy ee re fie) 
We also have 
eae 
oot ae (1.6.9) 
and 
ee eo 
man, (1.6.10) 


From these inequalities we now deduce that 
P(0.75 Say <1.25) = 0 
since (1.6.7—10) indicate that a, never. takes on a value in the interval 


[0.75, 1.25] ; and we see that the only values taken by a, which lie in the 
interval [0, 2] are (2 — )/(1 + e?) and e/(1 + €7), so that 


P(0 <a, <2) 


= Pla, = (2—e)/(1 + €?)] +Pla, = e/(1 + €?)] 


Finally, turning to a3 we see from tables 1.6.2 and 1.6.3 that the 
sampling distributions of a, and a3 are identical. Hence 


P(0.75 <a3 < 1.25) = P(0.75 <a, <1.25) =} 
and 
P(0 <a3 <2) =P(0 <a, <2) =}. 


We can now tabulate our results on the relative concentration of the three 
estimators in table 1.6.5. 


Table 1.6.5 
Probability that estimator takes a value in the 
Interval interval: 
rf a2 a3 
(0.75, 1.25] 4 0 } 
[0, 2] A 5 } 


Remark. We see from the table 1.6.5 that both a, and a3 give a higher 
probability than a, of obtaining a very accurate estimate [i.e. on an 
estimate close to unity, the true value of the slope coefficient in (1.6.1)]. 
But they also give a higher probability than a, of making a very large 
error. We can see this from the sampling distributions given in tables 
1.6.2—4; we can, in fact, deduce the following from the inequalities we 
have oktained above: 
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Play ile Te2b a= Ss , (1:26:11) 
Pla, 1 1.25) = 90; 
Pla; 1 | 125) = 4, (1.6.12) 


and the reader may like to check these for himself. 

In choosing between two estimators such as a, anda, we must first 
decide the relative importance of errors of estimation of different 
magnitudes. If we attached great importance to a very accurate estimate 
but were not greatly troubled by the prospect of making a large error 
then we might well select a, or a3 in this context; but note from 
(1.6.11—12) that the probability of making a large error is itself large in 
this case and this must make a, and a; less attractive. 


Solution 1.7 


Part (a). The model (1.7.1—3) is a simple case of simultaneous equations 
model (we will discuss such models in detail in Chapter 6). The quantity 
exchanged between sellers and buyers of the good in the market and the 
price at which the exchange takes place is determined jointly by the three 
equations (1.7.1), (1.7.2) and (1.7.3). We can then solve this system of 
equations for price, p, and quantity exchanged g = g* = q°. We have 


Lh? tat Soa Po 
so that 

p= 1+ hoa) (1.7.4) 
and then : 

Ge— 3 = prev = 2 Aero). 


We can now determine the probability distribution of the free market 
price p from equation (1.7.4) and the given probability distributions of 
the random disturbances u and v. Using the fact that wu and v are 
statistically independent, we obtain 


p (u, v) Probability 

1 ir mal M 

> Lo 5 

0 et 4 

3 Deg pel 5 

1 iFopa k 
2 0 gay 5 

2 or MTD | 5 

_— 

1 -1 -1 } 
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and thus 
p Probability 


CO MR eR NY PO 
OP OY OJW Oj Op 


The expected value of p is 


BP = E(p) = (2) + (¥)5 + (1)5 + (45 
= 1, 


Part (b). If the government now intervenes by purchases or sales of g to 
ensure that price remains at the fixed value # the model becomes 


an Ny Oks ae (1.7.5) 
q? a 3—p Pe (1.7.6) 
g§—q? —g = 0 (19757;) 


This system determines the extent of government sales or purchases g and 
substituting (1.7.5) and (1.7.6) in (1.7.7) we obtain 


g = (IPS +a)= (3p 2) 
Fg ad oR et Reape 
and when f = E(p) = 1 as in part (a) we have 
g=u-v (1.7.8) 


The probability distribution of g is now as follows: 


g (u,v) Probability 
0 es 5 
2 1 —1 5 
—] Ones i 
0 Om 80 4 
1 0 -1 5 
a ot 5 
—] —I 0 3 
0 sale = trl $ 


or 
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g Probability 


2 5 
1 $ 
0 3 
=a é Me 


We can now deduce the probability that g is at least as great as private 
demand g?, i.e 


P(g 9s) P(e Sapa? ie 
= Piu-—v22+0) ‘ 
es 2) 
since g = u — v from (1.7.8) and p = 1. But u — 2u 2 2 if and only if 
v =— 1 and either u = 0 or u = 1. Hence 


P(u — 2v 2 2) aes 0,0 =—1) he =J2 =~ 1) 


=h+)=3. 


Solution 1.8 


To forecast Qj977 on the basis of the known (and fixed) values 
Qior6 = 30 and K1976 = 75 we must first express Qj977 as a function of 
these fixed values and the random disturbances which occur in equations 
(1.8.1—3). We can achieve this by successively substituting for Y; (and 
then C, and /;,) in (1.8.1) until we obtain Q, as a linear combination of 
Y,-1, Ky-, and the random disturbances u4;, wz; and u3,. To put it 
another way, we derive the reduced form equation for Y;. 

From (1.8.1) and (1.8.4) we have 


Q, = a+ P(C, +h) + uy, 
and using (1.8.3) this becomes 
QO, = a+ BC, + BAO Yp2p SBNK 21 Ft Burge + hie. (1.8.6) 
Now from (1.8.2) and (1.8.4) we have 
Ce = yt 8(C +h) + uy 
so that, assuming 0 <6 <1, 
¥ 6 if 


C, = 4. +——1,4+— 
t ay eee ee Gy : (1.8.7) 


Then, using (1.8.3) for J; in (1.8.7), we get 
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y_, 89 POLE ei ua Demag 
and substituting (1.8.8) directly for C; into (1.8.6) we find that 
Q; = «+ +N vie, — ER Kes 
+ PNOY: 5 = PAK ay + Uy + Pils 
ae 
YB, BAO BX 


Se 
Oli oe th 


B 
dabei eon ce 1— 5%: (428.9) 


1 


By assumption Yj974 = 30 and Kj976 = 75 are considered to be fixed and 
non-random and the disturbances are normally distributed. It follows, 
therefore, from (1.8.9) that Qj977 is itself normally distributed with mean 


value 6 Bro B 
ey, 
E Ge eS 1.8.10 
(Qi977) a i = pass pees ( ) 


and variance 


var(Qj977) = var ba a (a Us vs) 


2 
= var, + (2,) var(u2¢ + U3t (e811) 
(since, by assumption, w,; is uncorrelated with uz, and u3,) 
B 2 
= (Ori + (2. (02 IF 203 =F 033). (eS oh) 


To obtain a point forecast of Q977 we can use the mean value (1.8.10) 
above which, when we substitute the given parameter values (assumed in 
the question to be the true values), becomes 
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je eames eet Bil Gee 
Bin = 100+ => + 30( 


= 100 + 1,000 + 4,500 — 3,700 = 1,850. 


~ 
On the assumption of normality, an interval within which there is a 95% 
probability that Qj977 will lie is just 


E(Qj977) + 1.96 [var(Qi977 )] 1/2 
y 1/2 


B 
Oy, + (2. (O22 + 2023 + 033) 


= 1,850 + 1.96 
1-6 


and using the given values for the parameters, this becomes 
1,850 + 1.96(100 + 30,000)’ 

1,850 £ 1.96(1 73.4935) 

(1509.9528, 2190.0472). 


Remark. We should observe that the above forecast and prediction interval 
for Qi977 are conditional on the given values of the variables Y197¢ and 

K 1976. Moreover, the assumption that the true values of the parameters in 
the model are known is an important simplification. In practice, these 
parameter values will not be known and must be estimated from sample 
observations of the variables. The estimates so obtained will then have a 
sampling distribution which we will need to consider in constructing a 
prediction interval for Q977. We will discuss this type of difficulty in 
solution 7.15 below. 


CHAPTER 2 


The linear model 


0. INTRODUCTION 


In this chapter we are concerned with single equations which are linear 
both in variables and in coefficients or which can be rewritten in this way. 
We shall make extensive use of the following notation throughout the 
chapter: the linear model will be written in the form 


y = XB tu (2.0.1) 


where y is a J x 1 vector of observations on the dependent variable, X is a 
T x k matrix of observations on k non-random explanatory variables, B is 
ak x 1 vector of coefficients, u is a T x 1 vector of random disturbances 
and k < T. Further information about the definitions of these variables 
will be given in the questions when it is required. On occasion the linear 
model (2.0.1) will be written alternatively as 


k 
if = u Bixie + Ue aot Be a) (2.0.2) 


where y, is the tth element of y, x;;is the ztth element of X’, B; is the 7th 
element of 6 and u; is the tth element of uw; (2.0.2) will also be 
written in the form 


we ex Brg T=, )S 25 Tr) 


where x; = (x1t; Xot) +++» Xkt) is the ¢th row of X. 

A number of abbreviations are used in this chapter. For example, 
ordinary least squares is often written as OLS, the residual sum of squares 
of a regression is written RSS, the explained sum of squares is written ESS 
and the maximum likelihood estimator is sometimes written MLE. 
Sometimes it will be assumed that wu is a normally distributed vector with 
zero mean and covariance matrix 07/7; we then say that u is N(0, 07/7). 

¢ . 


oT 
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1. QUESTIONS 


Question 2.1 
(a) Estimate by OLS the linear model 
Vt — at Bx to VX Ape ies (2a eas A ~ (27s) 


given the following sample moment matrix 


STACK pig aX Ka = Xp 


y-F 2000 100 90 
x4 Sau 100 10 5 : 
oe 90 5 5 


and sample means y = 1200, Sis = 100.and x, = 50, obtained from 

T = 100 observations. 

(b) If the uw, are distributed as independent N(0, o7 ) variables, estimate 
' 

On: : 

(c) Estimate the covariance matrix of the estimates of a, 6 and y and 

hence find the standard errors of these estimates. 


(d) Calculate R* and R?. 


Question 2.2 


In the linear model y = X6 + uw it is assumed that wu is N(0, 07/7). 

(a) Derive appropriate tests of the hypothesis: Hg: 6 = 0 against H, :B #0 
in the following two cases: (i) when 0? is known, and ‘(ii) when o? is 
unknown. 

(b) For the model of question (2.1), test the hypothesis 


Hy:a =B=y7y=0 
against the alternative 
Ai, :Q, B, of # 0 


(i) when o? is known to be 400, and (ii) when o? is unknown. 

(c) Suppose that you are given estimates of the coefficient vector B and o? 
as well as R* uncorrected for the constant term, how would you then 
perform the test in c (ii)? 


Question 2.3 


In the linear model 
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y = XB, + X28, tu (25321) 


u 1s N(0, o7 Ip), B, isak, x 1 vector, B, isak, x 1 vector and 
X = (X,:X,) isa Tx k matrix with k =k, + kp. 
(a) Derive a test of the hypothesis Hy : 8, = 0, 8, #0 against 
H,:6,, 8, #0. Show with proof how the test statistic is distributed. 
(b) Relate this test to the likelihood ratio test. 
(c) For the model of question (2.1) test the hypotheses 
Hy: a#0,B=y7=0 
and 
H,:a,B#0,y=0 


against 


H,:a,B,y #0 


Question 2.4 
In the model 
Ye = Bo + By x1 + Boxae + ue 6. ip eae (2.4.1) 


E(u,) = 0, E(u,u,) =o0* fort =s and zero fort #s. 

(a) Show that r2,.., the squared partial correlation coefficient between 
y and x,, given x2, can be interpreted as the proportional reduction in 
the residual sum of squares due to adding x, given that x is already an 
included regressor. Hence obtain an expression for 72;.93,..p- 

(b) Show for the general linear model that an additional regressor 
increases R? only if its ¢ statistic is greater than unity. What significance 
has this for maximising R? ? 

(c) For the model of question (2.1) find the partial correlation of y with 
X2 given x, and verify that these data satisfy (b) above for x2. 


Question 2.5 


(a) Show how it is possible to use 6, the OLS estimator of 6 in the linear 
model y = X8 + u, to provide a more efficient estimator of B when it is 
known that B satisfies the independent linear constraints RB = r. 

(b) For the model of question (2.1), test the hypothesis Hp: a #0, 

56 = y against H,: a#0, 56 #y if the u; are distributed as independent 
N(0, 07) variables. 


Question 2.6 
(a) Show how the test for the significance of an additional set of 


‘ ¢ é 
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explanatory variables in a regression can be used to test for structural 
change. State clearly any assumptions you make. 

(b) What problems may arise in using this test? Explain how they can be 
overcome. 


Question 2.7 


Instead of estimating the vectors of coefficients 6; and B, from the linear 
model 


y = XB, + X28, +4, | (2d li) 


where the disturbance u has mean zero, it‘is decided to use OLS on the 
following equation: 


y = X7B, + X28, tu" (2.722) 


where Xj are the residuals of.the regression of X; on X3. 

(a) Show that the resulting estimator of 6, is identical to the regression 
coefficients of y on X. 

(b) Obtain an expression for the bias of this estimator of B,. 

(c) Prove that the OLS estimator of B, obtained from (2.7.2) is identical 
to the OLS estimator of 6; obtained from (2.7.1). 


(Adapted from University of Essex BA examinations 1975) 


Question 2.8 


(a) Explain how the Gauss—Doolittle pivotal condensation method for 
inverting a matrix can be used to provide a computational algorithm for 
OLS. 

(b) Describe how you would use this method in stepwise regression. 

(c) Illustrate your answer to (b) for the general linear model 


SOT we Oye are! 


where X, isa J x k, matrix and X, aT x k, matrix. 
(d) The following sample moment matrix is based on T = 12 observations 


ae ee 1 v4 
Xn = X3 1 4 
Naas) 2 5 10 


\ 


Without estimating B,; and B, , use the Gauss—Doolittle method to test the 
significance of 6, and B, for the model 
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Ye = AFB xyt + By xo + us, (oSrlyer 4) 


where the u; are distributed as independent N(0, 0? ) variables. 


Question 2.9 


(a) Prove that the OLS estimators of B, in the following linear models are 
identical 


Ye = xB, + tB. + uy (2:9) 
Meo eee ee ee ed (2.9.2) 


where y; and x; are de-trended y, and x;, obtained by regressing y; and 
x, ont and setting y; and x; equal to the respective residuals. 
(b) Hence, show how it is possible to estimate B, in 


Ye = XB, + X48. + uy (2:9:3) 
using the regression equation 
Ve = X12B, + ui; - (2.9.4) 


which is a linear transformation of (2.9.3). 

(c) How is the significance test for the hypothesis B; = 0 in (2.9.2) related 
to that for 8; = 0 in (2.9.1) if the uw; are distributed as independent 

N(0, o? ) variables. 


Question 2.10 


The dependent variable y is regressed on a constant term and the k + 1 
independent variables x; ,x2,...,Xz+, using the observations 


VtsXity+++9Xk+i,t (Basle ster) 
The coefficient of determination R? is calculated from this regression. 
Another regression is run, this time using only the first k independent 
variables x,,...,X,- The coefficient of determination R3 is calculated 
from this second regression. 
(a) Show that Rj 2 R}. 
(b) Find an exact relation between Rj and R3. Under what condition 
is R? = R3? 


Question 2.11 


In the linear regression model y = X6 + u, the matrix X = (i: Z) where 7 is 
the sum vector 7’ = (1, 1,..., 1) and Z is a matrix of T observations on k 
independent variables. Ignoring the constant term in this model, an 
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investigator obtains the least squares regression equation y = Zb + u 
where b = (Z’Z)'!Z'y. rit 
(a) Show that y'y = b'Z'Zb + w'u. 
(b) Construct a numerical example in which 
R? =1-——“— <0 \ (2.11.1) 
eS Ora) 


_ 


where y is the sample mean of y,,..., yr. 
(c) Using your answer to part (a), suggest a modified goodness of fit 
measure R2, , which satisfies 


(ar <t 


(d) If the true model were y = Z6, + u where B, is the vector of the last k 
components of 8, would it still be possible to obtain a negative R? in the 
regression y = Zb + u? 


\ 


(University of Essex MA examinations 1974) 


Question 2.12 
In the linear model 
ye SX roa Bales, 


X isa T x k matrix, E(u) = 0, E(uu') = Y and © is a known non-singular 
matrix. 

(a) Obtain the generalised least squares (GLS) estimator of 8 and show 
that it is a best linear unbiased estimator (BLUE). 

(b) Examine the efficiency of the OLS estimator of B when each of the 

k columns of X is also an eigenvector of X(k < T). 

(c) Find a matrix of k( <T) explanatory variables X for which OLS is 
BLUE when 


Le Fest 0 0 0 
sil Yale 0 
x = o? 
0 0 0 ye | 
0 0 0 ma 1 


and comment on ». 
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Question 2.13 


(a) Derive the best linear unbiased predictor of y,(t =T+1,...,T+R) 
for the linear model y, = x;8 + u; with observations t= 1,..., T and 
where the u; are distributed as independent N(0, 0”) variables. 

(b) Find the covariance matrix of the forecast errors. 

(c) Hence, construct a 95% confidence interval for a one-period ahead 
forecast based on the estimated model 


vs = 100+ BS a 7 2X25 


where 
(UR Teed a) 
X'X *= 110.26. 0 
20r 05 90 


s=3,T=10,x, 7+; = land x2,741 = 2 (s? is the unbiased estimator of 
o” based on the OLS residuals). 


Question 2.14 


If the r component column vector yf is the best linear unbiased predictor 
of yr based on the linear model y = X6 + u where u is N(0, 07/7), 

(a) derive a test of the hypothesis Hy: E(yp — yr) = 0 against 

H,: E(jr —yr) # 05and 

(b) compare this test with the Chow test for structural change. 


Question 2.15 


(a) Define the term ‘consistent estimator’. 
(b) For the linear model 


y= AKG ged (eT 521) 


where E(u) = 0 and E(uu') = 0? Ip show that the condition 


where M is finite and non-singular, is sufficient but not necessary for the 
consistency of b, the OLS estimator of B. 
(c) Prove the consistency of the OLS estimator of B in the linear model 


ye = a+ tt uy (alia /.) (251 5,2) 
where E(u;) = 0, E(u?) = o? and E(u,u,) = 0 fort #s. 


¢ é 
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Question 2.16 


(a) Stating carefully any assumptions that you make, derive the 
asymptotic distribution of the least squares estimator of 6 in the linear 
model 


y = XB+u ¢ (2.16.1) 
where E(u) = 0 and E(uu') = 07 Ip. 
(b) Evaluate this distribution when 


56 simian allots! 
lim —— = | (2.16.2) 


\ 


(c) Obtain the asymptotic distribution of the OLS estimator of 6 in the 
model 


Ye = At PEt uy (oy cs ty) (2.16.3) 
where E(u,) = 0, E(u?) = 0? and E(u,u,) = 0 fort #s. 


Question 2.17 


The demand for Ceylon tea in the US is assumed to be determined by the 
equation 


InQ = Bo + 6; nPe + B, InP; + 63 InP, + B4lnY+u (24-750) 
where Q = imports of Ceylon tea into the US 
Po = price of Ceylon tea 
P,; = price of Indian tea 
Pz = price of Brazilian coffee 
Y = disposable income 


The following OLS estimates were obtained from T = 22 observations: 


InQ = 2.837 — 1.481 InPo + 1.181 InP; + 0.186 InP, 

(2.000) (0.987) (0.690) (0.134) 

+ 0.257 nY 

(0.370) 
RSS = 0.4277 (2.17.2) 
InQ+InPo = — 0.738 + 0.199 InPz + 0.261 NY 
(0.820) (0.155) (0.165) 

RSS = 0.6788 (2.17.3) 


\ 


The figures in parentheses are standard errors. 
(a) Test the hypothesis Hy: 8B; =—1, By =0 and Bo, 83, By #0 against 
Hye Bp O (ea ase 
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(b) Discuss the economic implications of these results. 


Question 2.18 
The following equation has been estimated by OLS: 


a = 1-551 10.6544in W=-5:0,0286.1n0 R? = 0.50 


(0.504) (0.266) (0.0403) (2.18.1) 


where Q = output, L = labour, W = product wage and the figures in 
parentheses are standard errors. It was derived from the marginal 
productivity condition for labour from the CES production function 


eg oN it gear] @ Uoeet 3? Coes (2.18.2) 


where K = capital stock, on the assumption that labour is paid its 
marginal product. 

(a) From (2.18.1) obtain estimates of a, the elasticity of substitution 
between labour and capital, and of v, the degree of returns to scale. 

(b) Test the following hypotheses: (i) Hy: o = 1 against H,: 0 #1; and 
(ii) Ho: v = 1 against H{: vy #1. You may assume that the covariance 
between the estimates of the coefficients of In W and In Q is zero. 


Question 2.19 


In a study of investment plans and realisations in UK manufacturing 
industries since 1955 the following results were obtained by least squares 
regressions: 


A, = const — 54.60 C;-,, R? = 0.89; DW =2.50 (2.19.1) 


(6.24) 
1, A= const — 19,96 (C;— C;-, ),, -R7 = 0.68; DW = 2.31 
(4.44) (2.19.2) 


I, = const + 0.88 A; hart L632 (C; a Cos i 
(0.10) (5.15) 


R? = 0.90;DW = 1.65 (2.19.3) 
I, = const — 50.08 C,-; — 14.60 (C;— C;-1), 
(3.64) (3.32) 
R? = 0.96; DW = 2.61 (2.19.4) 


where A; = the investment that firms anticipate they will complete in 
year t; these plans are held at the end of year t — 1; 
I, = actual investment in year t, i.e. the realisation of the plans; 
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C, =a measure of the average level of under-utilisation of capacity 
in year t, the greater is C, the more under-utilised is existing 
productive capacity ; 

and the figures in parentheses are standard errors. 

(a) Interpret these results and assess whether or not knowledge of firms’ 
anticipated investment is helpful in explaining actual investment. 

(b) Discuss any policy implications of these results for maintaining a 
steady level of investment. 


Source: Smyth and Briscoe (1969). 
(Adapted from University of Bristol B Sc. examinations, 1971.) 


\ 


Question 2.20 


Three hypotheses have been advanced to explain price formation: 

A. The theory of marginal cost pricing which asserts that changes in prices 
(Ap) are due to changes in unit labour costs (AULC), changes in unit 
material costs (AUMC), changes in the ratio of unfilled orders to sales 
[A(O/S)] and the level of capacity utilisation (CU). 

B. The theory of target return pricing in which price changes are due to 
changes in standard or normal unit labour costs (AULCN ), changes in 
standard unit material costs (AUMCY ), changes in the standard capital— 
output ratio [A(K/Q)] and changes in target rates of return (Az). 

C. The theory of full cost pricing which explains price changes by AULCN 
and AUMCN only. 

The following equations were estimated by OLS for manufacturing 
industries in the US 1954 (1)—1965(1): 


Ap = const + 0.068 AULC + 0.072 AUMC + 0.0005 CU 
(1.69) (3.00) (5.72) 


+ 0.127 A(O/S)-, 
(2.00) 


R* = 0.583 (2.20.1) 


Ap = const + 0.235 AULCN + 0.085 A(ULC — ULC’) 
(2.70) (2.11) 


+ 0.155 AULCN, + 0.065 AUMC + 0.0003 CU 
(2.52) (2.98) (3.08) 


+ 0.141 A(O/S)_, 
(2.21) 


R? = 0.654. (2.20.2) 


The figures in parentheses are ¢ statistics. 


THE LINEAR MODEL 47 


(a) Discuss briefly the economic reasoning behind these specifications of 
the three theories. 


(b) Evaluate the three hypotheses. 
(Source: Eckstein and Fromm, 1968). 


2. SUPPLEMENTARY QUESTIONS 


Question 2.21 
Consider the linear model 
Ye = Bot Bi xie + Boxe + uy (A ey Le). 


where B, + B, = 0 and the u, (t= 1,..., T) are serially independent 
random disturbances with zero mean and variance o*. From observations 
of yy, x14 and x, over T = 103 periods the following sample moments 
matrix has been calculated: 


5 ama) 390 30 —a2 0) 
x4 ates 30 20 10 
ee, & 290; ) 10 10 


Using these data, calculate the best linear unbiased estimates of B, and 
B,. Find the standard errors of your parameter estimates. 


Question 2.22 
In the model 
Vt ee Daypat. the (ial pemenenesel 6 (2.22.1) 


x, is anon-random exogenous variable and the w; are serially uncorrelated 
random disturbances with zero mean and variance o” for each value of t. 

The following sample moments have been calculated from 10 
observations of y; and x;: 


Ly, =8 =x, =40 Dy?=26 Va? = 200 Lypx; = 20 


where the summations are over t = (1, 2,..., 10). 
In some subsequent time period, s, for which the model (2.22.1) still 


holds the value of x, = 10. 
(a) Calculate the best linear unbiased forecast of y,, using the above data. 
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(b) Estimate the standard error of your forecast in (a). 


(Adapted from University of Birmingham B Soc.Sc. examinations, 1966.) 


Question 2.23 


> s ~ ere 
(a) Let B, and B, denote the estimated regression coefficients for the 
linear model 


y = XB, + X28, tu. 
Show that 
by = Bi, + (Xi XIX XB. 
where b, is a vector of coefficients from the regression of y on X,. 


(b) Given the following estimated regression equations 
C;, = const + 0.92 Y, tee 
C; = const + 0.84C,-; + ur; 
Cay = const + 0.78, +a; 
Y,=sconst + 0505C,2)° wa 
Calculate the regression estimates of 8B, and B, for 
t= Bo t Bile Bs Cry 1 a;- 


(University of Essex BA examinations, 1977.) 


Question 2.24 


A researcher obtained by ordinary least squares (OLS) the following 
model estimates: y = a + bx + e. Another specification tried was: 

y =a* + b*x +c*z + 4u, again estimated by OLS. 

Explain in detail under what circumstances the following could be true: 
(a) b* =6, 

(b) Da? < B22, 

(c) 6 is statistically significant at the 5% level yet b* is not, 

(d) b* is statistically significant at the 5% level yet b is not. 

( 


University of London M Sc.(Econ.) examinations, 1977.) 


Question 2.25 . 


The model y; = Bo + By xy¢ + Box + B33 + uz was estimated a 
ordinary least squares from 26 observations yielding: 
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Vite 2+ SR 0. Tay a2 Darke, Se 
(1.9) (2.2) (1.5) 


with t-ratios in brackets and R? = 0.982. 


The same model was estimated with the restriction 6, = 8,. Estimates 
were: 


Pepe kOe 25 (X42 F X34) — 0.6 x94, + 


(2.7) (2.4) 
R* = 0.876. 
B, 
(a) Test the significance of the vector | B, |from the unrestricted 
estimates. B3 


(b) Test the significance of the restriction B, = B,. Clearly state any 
assumptions utilised. 


(University of London M Sc.(Econ.) examinations, 1977.) 


Question 2.26 


The simple accelerator theory of investment behaviour says that net 
investment (J/’ ) increases as the change in output (Q,) increases 1.e. 


IY = a(Q, — Q;-1) (a0). (2.26.1) 
In order to test this theory three more equations are used: 
lL =PN+E (2.26.2) 


i.e. gross investment (J;) equals net investment plus replacement 
investment (J? ). 


IF = 8Ki-y (2.26.3) 


i.e. a constant proportion 6 of last period’s capital stock (K;-, ) is replaced 
each year. 


Ky /Qe =o (2.26.4) 


i.e. the capital output—ratio is a constant a. 
Given the following estimated equation for the US 1946—59: 


I, = 0.245Q, — 0.102 K;-, R? = 0.966 (222675) 
ise | (23.6) 
where the figures in parentheses are the ratios of the estimate to the 
standard errors i.e. the ¢ statistic: 
(a) Combine equations (2.26.1—4) so that an equation like (2.26.5) is 
obtained, 
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(b) Deduce estimates of a and 6. 
(c) It is believed that the capital output ratio is approximately 3. Test the 
hypothesis that a = 3. 


(University of Essex BA examinations, 1975.) 


Question 2.27 
In the model 
Sree Pkg thy (= yaw sl) (2.27.1) 


y; is endogenous, x; is a non-random exogenous variable and the w; are 
serially uncorrelated random disturbances each with zero mean and 
variance 07. 

The T observations are arranged into n groups. The 7th group contains 
T; observations of y and x and 2/., 7; = T. The means of y and x in the 
ith group are denoted by y; and x; (¢ =1,..., 7). 

Suppose that only the data on the group means [(¥;, Xi):7 =1,...,n] 
is available. 

(a) Write down the best linear unbiased estimator (BLUE) of B in (2.27.1). 
(b) Derive an expression for the variance of your estimator in (a). 

(c) Find the variance of the ordinary least squares estimator (OLS) of 8 
and hence derive an expression for the efficiency of the OLS estimator 
relative to the BLUE. 

(d) What is this efficiency (numerically) when T; = T, =...=T,,? 

Now suppose that the underlying data [(y:,x:):t =1,...,T] is 
available. 

(e) Write down the BLUE of B in this case and verify that the variance of 
this estimator is at least as small as the variance of your estimator in (a) 
when the available data was on group means. 


(Adapted, in part, from University of Birmingham B Soc.Sc. examinations, 


1966.) 


Question 2.28 
For the linear model 
‘vee teal! (232871) 


where wu is N(0, o* 17) we wish to test the hypothesis that the coefficient 
vector f satisfies the independent linear restrictions 


RB =r : (2.28.2) 


where R is a knowns x k matrix andr is a knowns x 1 vector. 
(a) Derive the Lagrange multiplier statistic (LM), the likelihood ratio 
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statistic (LR) and the Wald statistic (W) for testing (2.28.2) against the 
hypothesis that 6 does not satisfy (2.28.2). 
(b) Show that 


LM<LR<W. (2.28.3) 


(c) Comment on the relationship between these three statistics and the 
appropriate finite sample statistic for performing this test. 


Reference: Berndt and Savin (1977) and Breuch (1976). 


Question 2.29 
(a) For the model Y; = By + B, X44 + B. X24 + €; where the X’s are fixed 


in repeated samples, E(e€,) = 0, E(e;e,) = 


when €; is normally distributed, derive a test statistic for the hypothesis 
that X, has no influence on Y. 
(b) Discuss the validity of the following procedure for testing the same 
hypothesis. . : ; 
(i) Use the estimated regression Y; = Yo + ¥1X14 + Vz, regress V; on Xy 
to give: 

V; a 5 a 8, Xx + wy 
(ii) Test the null hypothesis Hp: 6, = 0. 


(University of London B Sc.(Econ.) examinations, 1975.) 


Question 2.30 
Given a CES production function 
Vie A [Bien ith (4-8) he Ye (2.30.1) 


where V, K, L are output, capital and labour respectively, and given data 
on these variables together with the real wage rate (w), an investigator has 
a choice of estimating either of the following two equations: 


V 
In () = Bo oe By In Wt = Uit (2.30.2) 
t 


V 
In Wt Pe Yo ae va In (F) a Ut (2.30.3) 
t 


é 
¢ é 
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(a) What assumptions about the nature of the product and factor markets 
are crucial to the choice between (2.30.2 and 3)? 

(b) Show that 6, is the elasticity of substitution. 

(c ) If B, and y, are the least squares estimates of 6, and y, show that 


Bin Sil 
(University of London, M Sc.(Econ.) examinations, 1970.) 


Question 2.31 
The following relationship holds for OLS estimates: 


6? f Ce k)rtig ; 
var(B;) L= Fle 
when B; = OLS estimator of coefficient of x;(¢ = 2, 3,...,k +1) 


Xp+1 = 1 for all observations (i.e. constant term) 

T = number of observations 

var(B;) = estimated variance of §;. 

"\i.q = partial correlation coefficient between the dependent variable 
x, and x; (¢#1,7#k + 1) in the regression including all the 


other x’s. 
Using this relationship, discuss the effect of varying degrees of collinearity 
among the regressors x2,..., Xz4, on the OLS estimates ;. 


(b) In the model x, = 6, x. + B3x3 +64 + wu an extraneous estimate Bz of 
B, is available. Discuss how this information can be used to obtain a better 
estimate of 63 than the OLS estimate. 


(University of London B Sc.(Econ.) examinations, 1975.) 


Question 2.32 


The matrix X of the linear model y = X86 + uw is partitioned into sub- 
matrices X, and X, of rank k, and k, respectively, where k, +k, =k, 
and the vector £ is partitioned conformably. Consider the following 
procedures for estimating 6, , some of which utilise the residuals from 
prior regressions of y and/or X, on X,: 

(a) Regress the residuals from the regression of y on X, on those from 
the regressions of X, on X2. 

(b) Regress y on the residuals from the regressions of X,; on X}. 

(c) Regress y on X, and the residuals from the regressions of X, on X>. 
(d) Direct regression of y on X, and X,. 

Show that the estimate of 8B, is.the same in all four cases, and comment 
on this result. 

Compare the residual sum of squares and R? values obtained from each 
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of the four regressions. 


(University of London B Sc(Econ) examinations, 1976.) 


Question 2.33 
Consider the linear model 
Ve = Buy tu; (=\1,... 247) 


where E(u;) = 0 and the covariance matrix of the u; is known. Under 
what assumptions would 


CU hey a > (ye/x+), and 


be best linear unbiased estimators? 


Question 2.34 
For the linear model 


k 


Ye = 2 wie Bi + (= sees) 
= 
k 
Y=) Xb 0p 1 uy (feel ie cee ce) 
i=1 
(a) interpret the OLS estimators of 6; =1,...,k)and6;(¢=T+1,..., 
T + R) obtained using the complete sample t = 1,..., T + R and also the 


covariance matrix of these estimators. 
(b) Interpret the multiple correlation coefficient when 


Vee Val Olen Foal til) tretied tye 
Reference: D.S. Salkever (1976). 


Question 2.35 
In the model 
Ve = Bue tut (ales) 


the x,(t-= 1,..., T) are non-random positive quantities, y; is an 
4 é 
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observable random variable and the u; are serially uncorrelated random 
disturbances distributed with zero means and variances given by 

E(u?) = o? x? (p > 0) 


It is known that the jth moment about the origin of the quantities 
RR A Te nore, ee aS a 


Dy, 
pt ad = co 2 0 Re Aes a) 


t=1 
where w > 0 and 77 is the coefficient of variation of the x;. 


(a) Find expressions as functions of p and.7 for the efficiencies of the 
following estimators of 6 relative to the BLUE of B: 


T T 
Op » xine | Dx? 
t=1 t=1 


T T 
» Vt Xt 
1 t=1 


(b) Evaluate the efficiencies of b, and of by numerically for p = 1 and 
n =}. Comment on your results. 


(Adapted from University of Birmingham B Soc.Sc. examinations, 1966.) 


by 


Question 2.36 
In the model: 
VEEN cr 


where y is T x 1, X is a fixed T x k matrix of rank k and u is N(0:07JT) 
consider imposing the restriction RB = 0 where R is a known r x k 
matrix of rank r. 

(a) Show that the difference between the restricted and unrestricted 
residual sums of squares in this model is given by 


(RB)' [R(X'X) 1 R'] (RB) 


where 6 denotes the ordinary least squares estimator of B. 
(b) For the particular case: 


b) 


Ye = AX yp + 2X tagxx% + uz, (teens) 


a sample of size T = 46 yields the following data on sums of squares and 
cross-products of the variables: 


THE LINEAR MODEL 55 


Y Xt XD X3 
y 8 0 3 1 
xy 0 2 1 2 
x2 sit: “Se I 
X3 1 2 1 4 


Carry out a test of 

Hy:a, ta, —a3=0 and a, —a, + 2a, =0 
against H, : Ho is false. 
(University of Essex MA examinations, 1977.) 


Question 2.37 


(a) {@p: T= 1, 2,...}is a sequence of n x 1 random vectors that 
converges in probability to the constant vector a. It is known that 
VT (ar — a°) has a limiting normal distribution with zero mean and 
non-singular covariance matrix &. If fis a continuous (scalar) function 
with continuous derivatives to the second order (at least), find the 
limiting distribution of VT[f (ar) moth (xn | 
(b) If 6, and 6, denote the least squares estimates of 6, and B, in the 
model 

Ye = Bixie + Boxe + Ut ees, fi) 
under what conditions will 


VT(b, [b> — B, /B2) 


have a limiting normal distribution? What is the variance of this limiting 
distribution? 


Question 2.38 


(a) If X,, X,,..., Xz is arandom sample from a population with 
distribution function F(X) and frequency function f(X) what conditions 
on this distribution are sufficient to ensure that the sample moment 
(1/T) 2f=, x} tends in probability to the rth moment of the distribution 
(r integer and r > 1) as T tends to infinity? Prove your result. 

(b) Consider the linear model y = XB + u where E(u) = 0 and E(uu’) = 
o? I. If b is the least squares estimator of 6 show that 


sy Xb) (y= Xd) /(T —k) 
is a consistent estimator of 07, 
(Adapted from University of Essex BA examinations, 1973.) 
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Question 2.39 


In the following three models 


vee pene (t= 1, we) (2239-1)) 

Ver BOR Tur Cae rs, ; (2.39.2) 

sf Seis (¢=1, la) (2:39;5) 
the y,(t =1,..., 7) are observable random variables and the u; 
(t=1,..., T) are serially independent random disturbances with zero 


mean and finite variance 0”. In (2.39.2), g is a known constant. 

(a) Show that the least squares estimator of the parameter a in (2.39.1) 
cannot be consistent. Is the same true for (2.39.2)? 

(b) Examine whether the least squares estimators of @ and B are consistent 


in (2.39.3) 


co 1 n* 


1 ae 
[sin = = = yi rica a 
t=1 


t=1 t? 6” t 90 


(Adapted from University of Essex BA examinations, 1976.) 


3. SOLUTIONS 
Solution 2.1 


Part (a). Using the notation discussed in the introduction to this chapter, 
equation (2.1.1) can be written more compactly as 


TBP (kad (2.1.2) 


where the ¢tth row of X is (1, x44, x22) and B = (a, B, y)’. We can also 
partition X as (7: X,:X,) where 7 denotes a column vector of ones and 

X, and X, are column vectors with tth elements x ,,; and x ;, respectively. 
The least squares estimator of 6 defined in (2.1.2) is 


Pots (eae ye (2s1.3) 


Denoting the least squares estimators of a, B and y by a, b and c, and 
substituting for X in (2.1.3) we obtain 
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a 
pe [(¢X, X2)'(¢X,X)] 1 (1X, X2)'y 
Cc 
» Dest 
7 TX, TX, Ty 
= |TX, XiX, XX, Xiy 
or i =f 
; ee = eX i 
b* Xo. Moe Myy 
where 
Zz Sat Poe bd Geen eS 6 
Ne (X,, x5), My - =| ais Coe | 
TU X5 X71 a X, 


and b*' = (b,c). 
Pre-multiplying (2.1.4) by 


| ees g 

een.) Ses 

we find that 
a+ Xb* = 5 
X'a+M,,6* = Myy 


Substituting (2.1.5) into (2.1.6) and simplifying we obtain 


5 Xb 
b* = Myx My 
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(2.1.4) 


(2.1.7 
(2.1.8) 


es 


where M,.,, = M,, —X'X, M,y =M,, — X'y, the ijth element of M,,, is 


2a ig ei) aye Oe) (t, 71,12) 


and the ith element of M,y is 


bs y (xi Xi) (VES 9) (o=1 2) 


A 1 
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We can now substitute directly into (2.1.7) and (2.1.8) to obtain the 
required estimates. Thus 


chat oleripel 


a: | : i po 
SH Eee waders 


and 
2 
a = 1200 — (100, 50) re = 1200— 100 
= 200. 
The estimated line is therefore 
y = 200 + 2x, + 16x, 
Part (b). We estimate 0? by s* = u'u/(T — k). But 
uu = (i Xbors)'(y — Xbots ) 
T 
= oy (ye Se OO foe CX 4)? 
t=1 


orsirom (2.1.7), 


sr 2bc sy (x44 ea) (x 24 — Xs) ae e 
t=1 


N 
TMs 
<p 
X 
H 
bad 
Ne 
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= T(M,, — 2b*'M,, + b*'M,,b*) 


where 


Using (2.1.8) we find that M,, =M,..b* and hence that 


u'u = T(M,, —b*'M,,b*) (2.1.9) 
Thus 


ee 10meod [2 
u'u/T = 2000—(2, 16) | | 
5 5] 16 


= 2000 — 1640 = 360 
and hence 


s? = 100 x 360/(100 — 3) = 371.134. 


Part (c). In order to derive the covariance matrix of the estimates we shall 


make use of the following results on the inverse of a partitioned matrix, 
Theil (1971, p.18) 


. 2 
erTa>rTe errala 


ase 


— A B(D—CA™B) 


ne 
An yA B(D— CALB)CA 


—(D—CA"B)"CA} 


(2.1710) 
where A is assumed to be non-singular. An important special case in 
econometrics is where the partitioned matrix is symmetric. Thus from 


(2.1.10): 
eae ee : ay L(V REZ(Z' OZ ZEN (KX) 
Pen Ne (207 2 eet | 


SAO AD Oe UM OT AR. 


where Q =] — X(X'X)'X'. 
Now the estimated covariance matrix of the estimates is 


> 
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2 jee She: x | 
Yee 
s ( ) Th ed M, x 
Using (2.1.11) this is 
a 
ee utente hee Mi -XMGE Xx eee 
NE) me Serre ar 


where... =ZOZ=M,, —XX- 
Hence the covariance matrix of a, b and c is 


10 5\! /100 10) “Sic 
1+(100, 50) —(100, 50) 
ya 371184 eee a ! 


109 fe i a : 
Bl 50 


= 3.71134 eS ree Bee ee early pind 


= 3.71134 


3715.05 —37.113 0 
= —37.113 0.7423 —0.7423 
0 —0.7423 1.4845 


The standard errors of a, b and ¢ are the square roots of the respective 
elements on the leading diagonal of the covariance matrix. We write these 
as 
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Sq =V 3715.05 = 60.95, s, =V 0.7423 = 0.862 ands, =V 1.4845 


tat Fed es 


Part (d). If uncorrected for the constant term, R? measures the proportion 
of the total sum of squares explained by the model. We distinguish this 
case from the coefficient of determination, R?, by using the symbol R2. 
Here we have 


: UU 
Bee ba 
ae 
a u'u/T 
Myy ve 
rp <8 360 
2000 + 1200? 
= 0.99975 


(we retain 5 significant figures for use later). 
It is more usual, however, to measure instead the proportion of the 
variance of y explained by the model, i.e. 


R2 = | -—— 


= 1—360/2000 
0.820 


R2, the corrected coefficient of determination, is defined as 


Il 


Ress Rovere 2 (1B) 


25 
O82 Sacre oko 
Dd. 


0.8163 


See Johnston (1972, pp.129—135) and Theil (1971, ch.4) for further 
discussion of R? and questions (2.10) and (2.11). 


Solution 2.2 


Part (a). (i) First, consider the estimated regression 


y= Xb+a=jti (2.2.1) 


_¢ 
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where b = (X'X)! X’y is the OLS estimator of B, u = y — Xb = 

[1 — X(X'X)1 X"]y is the vector of residuals and y is the fitted value of 
y from the regression. We introduce the idempotent (or projection) 
matrices P = X(X'X) 1X’ and Q =I —P. As P is asymmetric idempotent 
matrix, P = P? = P’P and PQ = QP = P. Hence, y = Py and u = (I — P)y 


THe sum of squares y'y can now be shown to satisfy 
= IE ety 
= yPy + y'Qy 
Fie Byte 0%, | 
= yy tau : (2222) 


We have, therefore, decomposed the total sum of squares y'y (or TSS) 
into two components: the explained sum of squares y’y (or ESS) due to 
the explanatory variables X and the unexplained or residual sum of 
squares u'u (or RSS). Thus (2.2.2) can be written as 


TSS = ESS + RSS (2.2.3) 


From this decomposition we can construct our test by checking whether 
or not ESS is significantly different from zero; a significantly large ESS 
implies that Ho is incorrect and hence that 8B # 0. 

We shall require the following results in order to find the distribution 
of ESS (see Theil, 1971, pp. 137—143). 

(A) If y ts N(p, =) and & is non- singular then\(y Spey u) ts 

distributed as a x} where T ts ie size of y. Hence if y is N(0, 07 Ir) 

then y'y/o? is distributed as x}. Also, if Ay and A, are two idempotent 

matrices then 

(B) y'A;y/o? ts a Xta; (« = 1, 2) 

Further if A,A, = 0 then 

(C) y Ay y/o? and y A2y/o" are FUSE ROT) distributed, and 

(D) y'(A, + A,)y/o? is distributed as Nias + trA, 


"A trA 
(E) > 1y brag 
y'Ary trA, 


is distributed as Fira. tra, 


Now on Ag, y is N(0, o'tr) and hence from (B) above ESS/o? = 
y'Py/o? is er as Xf, since P is idempotent and 
trP = tr[X(X'X)! X’] = tr (Xa) = X'X] =trl, =k. Thus, if o? is 
known, Ho will be rejected in favour of H, if ESS/o? > x2 (a), where a is 
the significarice level of the test. 

When o? is unknown it can be estimated by s? = RSS/(T— k). The 
above test statistic now becomes 
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ESS _ y'Py/o? 
s? —-y'Qy/o? me 
Xk 
= T—k 
XP -k ( ) 


since y'Qy/o? is also a x? with trQ = T—k degrees of freedom and in 
view of the fact that PQ = 0, it is distributed independently of y'Py/o* 
(see (C) above). It follows that 

Ess xe ok 


s*k XP -k k 


has an F, 7 -, distribution. Thus, when o? is unknown, Hp is rejected 
when ESS/s?k > Fr r-» (&). 


Part (b). (i) From part (a) our required test statistic is ESS/o? , which in 
this case is distributed as x3. Now from question (2.1) 
ESS = y'y—a'u = T(Myy + 9? —w'u/T) 
= 100(2000 + 1200? — 360) 


= 144,164,000 
Therefore 
ESS/o? = 144,164,000/400 = 360,410 


which is clearly significant. We can, therefore, reject Hy in favour of H,. 
(ii) When o? is unknown we use the test statistic 
ESS _ 144,164,000 
s?k yw Oh oe oe ies 


which is distributed as an F397. But F3,97(0.01) = 3.98 so that we again 
reject Hy in favour of H,. 


= 129.481 


Part (c). If we have data on R2, T and k, then we can use the fact that 
ESS = R2 x TSS ands? = (1 — R2) x TSS/(T — k). Substituting these 
into the test statistic ESS/s* k we have 
ESS _ Rz : nek 
s*k 1aR2 k 
For question (2.1) RZ = 0.99975. Hence 
ESSi= 0.99975 ; 97 
s*k 0.00025 3 


as before. 


= 129,481 


é 
rad é 
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Solution 2.3 


Part (a). On Hy equation (2.3.1) becomes 


Ve XB, PU (223.2) 
whereas on H, we have a 
y = XB; ar X> By ar a (21333) 


or, more compactly, 
y = XB+u (2.3.4) 


where X = (X,: X2) and B’ = (8;, By). On Ho the total sum of squares 
y'y can be written as 


yay Soy Pay ty (La 2y)y 
where P; = X,(X;,X,)'X;. On Hy we obtain 
yy =y Pycky Cie Phy 
y Pyrteytr —Py Wer yuk Ply, (23:5) 


where P = X(X'X)! X’. The term y'P, y represents the explained sum of 
squares due to X, and y'(I — P)y is the residual sum of squares on H, 
which we denote by RSS,. y'(P — P, )y can be interpreted either as the 
additional explained sum of squares due to adding X, given that X, is 
already included, or as the reduction in the residual sum of squares on 
Ho, namely RSS, = y'(I — P; )y, due to adding X,. That is, 
p (PAP pyri Pyy Bi y= yl = By ae eae 

We can base a test of Hy against H, on the observation that if Ho is 
correct then adding the variables X, to equation (2.3.2) will not increase 
significantly the explained variation in y. In other words we would not 
expect y'(P — P, )y to be significantly different from zero. In order to 
test this we require the distribution of y'(P—P,)y on Ho. 

On Hy we know that y is distributed as N(X, B, o* J), or equivalently, 
that y — XB, is distributed as N(0, o*/;). Therefore, 
(y — X,8,)' A(y — X,B,)/o? is distributed as x24 if A is an idempotent: 
matrix. We can show that P — P, is an idempotent matrix as follows: 


as P and P, are idempotent. But since X; = (I, : 0)X’ 


I 
PP, = X(XUX) 1 XX, (X1X1) 1X1 = X(X'X) XX "| (Xi Xi" x 


I 
-x|)| (Xi Xe 
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and similarly PP, = P,, we find that 

Cire CARY Ocal a) BP lent ofl ire? al at cae 
as required. It povows that (y — X,8,)' (P—P,)(y — X,B,) fo? i 
distributed as xf, , since tr(P—P,) =trP—trP, =k—k, = zh OB 


LP Pax = X(X'X) ? X'X, Na Xq Xi) XX, 


; ae ie Hf 
Pe) Uy lees =X H == 0: 
Therefore, 
(VAs ks (PPS As Bio? =y (P= P, yor. 
Thus, if o? is known, we use the statistic of '(P—P, )y/o? = 
(RSSo — RSS, )/o? , which is distributed as xf, , to test 0) against H,. If 


07 is eae then, as in question (2.2), we can replace 0” by the 
estimator 


s — RSS, /(7 —k) = yi — Py (rk), 
to obtain 


on iO nda )y Es 2 aaa EILE ap 
y (F—P)yi(T—k)” * yaa — Phyo? 
We have shown above that on Hy, y'(P — P, )y/o? is distributed as a Xf, . 
We can also show that on Ho, (y — X, 6; )' ue P)(y — X,B,)/o? 
= y'(I— P)y/o? is distributed as a Bi and that as 
(P—P,)(I—P) =P—P, —P? + P\P=P—P,—P+P, =0 
these two are distributed independently of each other. Hence 
y'(P— P,)y/o? sliggh = Ropers BSS, Tk 
p(y ie" ko : RSS, ko 


is distributed as F;,, 7 -r- 


(2.3.6) 


Part (b). The likelihood ratio test is based upon the ratio of the likelihood 
function, evaluated on Ho, i.e. L(Ho ), to that evaluated on H,, i.e. L(H;, ). 
Let \ = L(H))/L(A;, ) then, since L(H ) and L(H, ) are non-negative and 
L(Hy) <L(H,), we have O<A <1. L(Hy) < L(A, ) because on Hy we 
are maximising on a subset of the parameters over which we are 
maximising on H,. Thus, in the present example, on H, we are 
maximising over 6, , B. and o” whereas on Hy we are maximising over /, 
and o?. L(H,) can, therefore, be viewed as a constrained maximum in 
which 6, = 0. 

If Hy, is correct then the data are generated by equation (2.3.2) and are, 


‘ ¢ 4 
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therefore, consistent with B, = 0. In this case, we expect a value of A 
close to unity. A low value of A on the other hand suggests that we have 
imposed an effective constraint on Hy implying that Hp is not correct. 
Our decision criterion is, therefore, to reject Hy if \<C,, where Cy is 
the critical level of \ for a level of significance a. 

Since the OLS estimator is a maximum likelihood estimator for the 
general linear model (2.3.1), we obtain the following expressions in the 
present example: 


We MY = A)) 
2a" 


\ 


Lye ey 2 exp 


where the symbol « means ‘proportional to’ and we denote a maximum 
likelihood estimator on Hy by * and on A, by ~ . It follows that 


eS (a? /a? \e2 


But o? = RSSy/T and G? = RSS, /T hence 
RSS, \72 = T/2 
gers eee 
RSS, RSS, 


8 T/2 
i = 
T—k 


where F was defined in Part (a) above. ) is, therefore, a decreasing 
monotonic function of F implying that low values of \ correspond to 
high values of F. (See Figure 2.1). Thus if \ << Cy then F > Fy, implying 
that we would reject Hy for F > F,. We have shown, therefore, that the 
intuitively based test derived in Part (a) corresponds to a likelihood ratio 
test. 
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Fy 
Figure 2.1 


Part (c). First, we wish to test Hyp: a#0, B = y = 0 against H,: a, B, y = 0 
for the model 


X, = A+ Baye + xe + uy. 
From Parts (a) and (b) our test statistic is 
RSS, — RSS, 100—3 
ralgo (pgs Shenae 


where RSS, is the residual sum of squares on Hp, i.e. of the regression of 
y on aconstant which is just 2(y — y)? and RSS, is the residual sum of 
squares of the regression of y on aconstant, x; and x2. 


Now 2(y — ¥)* = 100 x 2000 and RSS, is already calculated in the 
answer to Question (2.1) to be 100 x 360. Therefore 


O00 4( 2000 7360) 497 
a 
100 x 360 4 


This value is highly significant at all conventional significant levels. We 
reject Hy, therefore, in favour of H,. 
We now want to test H,:a, B #0, y = 0 against H,. Our test becomes 


RSS, 1 


where RSS, is the residual sum of squares of the regression of y ona 
constant and x, . Instead of computing RSS, , however, there is a simpler 
but equivalent way to proceed. We note that F above is distributed as 

F, 7-» and that the distribution of F,,7-, is identical to that of t? 7_,, 
i.e. the square of a student’s t-variate. Thus our equivalent test is to use a 
t-test on the coefficient y. We recall that t = y/SE(y) where SE(y) denotes 


F 


F= 221) 
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the standard error of y and hence from question (2.1) we have t = 
16/1.218 = 13.14. This is also highly significant at all conventional 
significance levels. We may, therefore, also reject H, in favour of H}. 
See Theil 1971, (pp. 137—143) for further discussion of hypothesis 
testing in the linear model. 
\ 
Solution 2.4 


Part (a). The partial correlation coefficient r,,;_. is usually defined as 
follows (c.f. Johnston, 1972, pp.61—65 and Theil, 1971, pp.171—175) 


eg le ; (2.4.2) 
Ase 9 fs OY eS ee ; oie 


where rz, denotes the correlation of a and b. We denote the residual sum 
of squares of the regression of y on x1, x2,x3,...asS RSS, 493,... The 
question requires we show that 


RSS, 2 ae RSS, 12 


r2 = SS 4.3 
ap roe | (2.4.3) 
In general 
RSSy.193... = y'y —y'X(X'X) 1 X'y 
where X = (x; i x2 ix3!...). Thus for X consisting of the single variable 


0) 
RSSy.. = yy — y'x2 (2x2) 1 x2y 
a ’ ’ 
= yy — (y'x2)? [x2 x2 
If all variables are measured about their means (this is equivalent to 
including a constant in the regression) 


RSSy.. = y'y{1 —(y'x2)7/[(x2x2) (y'y)]} 
= y'y(1—r2,) (2.4.4) 
Similarly, for X = [x, : x4], 


Na TO yxy 
RSSy12 = ¥— (Yara ¥ xo) |= = Se 


; ; : Eis ie Wilber Se 2d bac a 
a Od ee) | ae 
| 


1 


(x1%1 )(x2%2) — (x1 x2)? 
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= yy — 1)" (22) = 29's )(y'x9 eet x2) + (y'x2)? (x11) 
(ig 09) (x es)? 
: 12, — Qa teotis +12 
= y'y (1— Patan tr 2 2 | (2.4.5) 
12 
It follows that 


RSS,.. — RSS Sie +75 
y.2 2125 [a—r.)— (1 2ry1Vy27 12 2) aay) 


RSS, eee 
s ” (2.4.5) 
= 19a (lr fe) + 151 = 2ry1 ry M12 + 132 
(1—r52 )(1-r'2) 
= ry ry ya Tia + a7 i2 
(1-132 )(1—ri2 ) 
ie: (Ty1 7 y2 712)? 
(1-132 )(1-rj2 ) 
= T5912 
the required result. 
The following more general result can be obtained in a similar way 
RSS = BOS 
2 hea y.23..k y.123..k (2.4.6) 


RSSy.23..% 


Part (b). For the model 
y —= Bixee see, th (2.4.7) 


where all variables are measured as deviations about their means we 
define R* for k explanatory variables as 


x k 
Re (= Re 1 — R2), 2.4.8 
i biti tan a i) ( ) 
with ri 
R 
R2 = 1-1 (2.4.9) 
Mee 2 
Thus 
x RSS}.493..'k k RSS, 123... 
SR a a a a 
yy 1 ee ol vy 
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‘Teak a RSS, 123...% 


Takei y'y 
or 
RSS y.193...8 = pan) (i R?) 
yy Uhre! 
Similarly, 
RSS ig Pik s 
ee ees (i Ry 
vey dies # 
Hence, taking the ratio of (2.4.11) to (2.4.10), 
RSSy.123...@e-1) a LRk 2 1 res 
RSSy.123...k {sai pid nll aes 


But from question (2.3), 
= RSS, 123...(n-1) — RSS, 193... 


thn 
RSSy.123... 
or, rearranging, 
RSSy.123..e-1) ‘rete be 
RSSy 123... Lake 
Combining (2.4.12) and (2.4.13) we obtain 
T= k il Pak Ree i 
: a af ee SS 
(Poa ie {hy Moa | 
or a 
liek cases tue ie Mio ed Gt 
1— R? T—k 
og 
= ]+ 
ReoR 


Now R? 2 R2_, implies [(1 — R2_,)/(1 — R2)] 2 1; but 


(T—k-1) 


(2.4.10) 


(2.4.11) 


(2.4.12) 


(2.4.13) 


(2.4.14) 


[1 + (¢? —1)/(T —k)] Z lif and only if t? 2 1, since T—k > 0. We 
have shown, therefore, that R* increases as a result of adding a regressor 
if that regressor has a t-statistic greater than unity. A further implication 


of this result is that in order to maximise R* we must exclude all 


variables with t-statistics less than unity. This follows because RZ < RZ_, 


if t? <1 for the kth variable. 


Part (c). From (2.4.6) the squared partial correlation coefficient of y and 


x2 given x, (and a constant) is 
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RSS,., — RSS 
r2 ps cesta fl We ae Saget 12 2) 
yal RSS, , (2.4.15) 


We shall now show how it is possible to calculate r2,, from the results of 
question (2.3), i.e. given the t-statistic for the coefficient of x3. 

In question (2.3), RSS, , and RSS, ,. were denoted by RSS, and 
RSS, , respectively. It was also shown that 


2 = F = Se (2.4.16) 
From (2.4.16) it follows that 
RSS, bes 
RSs, TS3 
Consequently, 
RSS, — RSS, 
x24 ah Ce 
RSS, 
Pe ress: 


ll 

— 
| 
FT 
4 
+ 
ae 

| |“ 
9 
pee oe 


wale 
char 
aaa BE a 
97 


= 0.640. 


Hence r,2.; = 0.800 our required result. We can interpret this as implying 
that there is a 64% reduction in the unexplained variance of the model as 
a consequence of introducing x, in addition to x; and a constant. 

Finally, since we already know that the t-statistic of x2 is greater than 
unity, we are required to show that R? increases through adding x2. For 
a regression of y on x; and a constant, equation (2.4.8) yields 


= i 
Ro Re eee 14) 


But for the bivariate model 
at ere) 
R? = ry = miy/myy my 


where M,, = 2x 443,/T etc. Thus 


é 
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100? 


R2 — eee 
2000 x 10 


= 0.5 


and hence 
R? = 0.5 — ds (1 — 0.5) = 0.495. 


In question (2.1) it was shown that R? for the regression of y on x1, X2 
and a constant equals 0.816. As expected, therefore, R? has increased 
through adding x. 


Solution 2.5 ' 

\ 
Part (a). We require a constrained least squares estimator instead of an 
unconstrained least squares estimator. To minimise u’u with respect to 


6 subject to the linear constraint RB = r we set up the Lagrangean (see 
Johnston, 1972, 157—159 and Theil, 1971, 143—145). 


Y = u'ut+2d'(Rb—71) (2.5.1) 


Differentiating with respect to B and A and putting the result equal to 
zero we have 


Of ’ 77 y 

EW ee ee ee (2.5.2) 
EY yf paiva aye 

ae ( r) = (2.5.35) 


or, more compactly, 


ete ls 


Hence 
tera eee ba 
alos R 0 ¥ | (2.5.4) 
= [SO aT A | PDEA) 
AR(X'X)! ae Seeuae HEL ey 
(2.5.5) 


where A = [R(X'X)! R']~!. It follows that 
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A 


B 


[J — (X'X)! R'AR] (X'X) 1 X'y + (X'XY R'Ar 
b+ W(r—Rb), (2.5.6) 


where W = (X'X)' R'A. Equation (2.5.6) shows how it is possible to use 
6 to obtain an alternative estimator of 6. To prove that 6 is more efficient 
than 6b first we note that 


E(B) = E(b) + W(r—E(b)) = B+ W(r—RB) = B 


and second that 


A 


P—6. => = 65W(r—Rb) =W(r— RB) 
= (f-- WRY(b eB) 


Setting the covariance matrix of b equal to V = 0? (X'X)!, the covariance 
matrix of 6 is given by 


Vg = (I—WR)V(I—WR)' 
= V—WRV-— VR'W' + WRVR'W' 
But from the definition of W and V it follows that 
WRV = VR'W' = WRVR'W' 
=u A) Rh LRA A) eel aR (AA) 
= VR'(RVR')'!RYD, 
which is non-negative definite. Thus 
Vg = V—VR'(RVR')'RV 
and so B is more efficient than b because V exceeds Vj by a non-negative 


definite matrix. 


Part (b). The null hypothesis Ho is a special case of the more general 
hypothesis RG =r or, equivalently, RB —r = 0. In this case, if all 
variables are measured about their means, @ is eliminated and Hy becomes 


Gy=1) | = 0 
Y 


where, to avoid confusion we have re-defined the scalar B of equation 
(2.1.1) as B*. The alternative hypothesis is RB #7 or RB —r # 0, implying 
that 6 is unrestricted. The Wald test of Hy: RB —r = 0 against H,: RB—r 
# 0 consists first of obtaining an estimator b of 6 on H, and then using 
the distribution of Rb —r as the basis of the test. On H,, b is just the 
unrestricted OLS estimator and hence Rb ~ ris distributed as a 


¢ . 
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N[RB—r, R(X'X)'!R’] variable. This follows from the fact that 
E(b) = 6 and hence 
HR 7) = hpi 
and the covariance matrix of Rb — ris 
EURO?) (Bbw ey (Rein?) Sees meee 
= Elh(O~ Br pin 
= RE(b — B)(b —B)'R' = o? R(X'X)'R’. 
For o? known, therefore, the test statistic for Hp against H, is 
K, = (Rb—r)'[R(X'X)!R']7! (RA r/o? (2.5.7) 


which is distributed as x2, , where m is the number of restrictions. For 0” 
unknown we can replace o” by the estimator s? = u'u/(T — k) where u is 
the vector of residuals from a least squares regression with no restrictions 
on the coefficients. The statistic 


Ky = (Rb—71)'[R(X'X)'R']"! (Rb — 1) /s2 (2.5.8) 


is asymptotically distributed as x7, (see question 2.16 for further details 
on asymptotic distribution theory). Alternatively, we can use the fact that 
F = K,/m is distributed as F,, 7 -, in finite samples (Dhrymes et al, 
1972). 


For the present problem m = 1, hence 
F = T(Rb —r)'[RM,LR'] 1 (Rb —1)/s? 


where we have used the result from question (2.1) that the zth element of 


VL OS © (Xia PAHS wiki biel OF 45 Jemaal* 2. cb aus 


aa 100 5 -1)4 |-31=5 5 
rae ® 29[ ||. 0k [og salen 


5 Ne 
we ihe 


0 
== (=6)1185/25' §(-6) = 13h, 

Sa pj{ 6185/25)" (-6) = 1.3 

But Fy 97(0.05) = 3.94, hence F < Fy 94(0.05) and so we cannot reject 

Hy: 56* = ¥ at the 5% level of significance. 


Remark. There are two alternative tests to the Wald test used above. 
These are the Aitchison—Silvey (or Lagrange multiplier test) and the 
likelihood ratio test which was explained in question (2.3). The Lagrange 
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multiplier test involves testing Hy: A = 0 against H, : \ #0, where J is 
the vector of Lagrange multipliers defined in part (a). The rationale 
behind this test is as follows. If the coefficients satisfy the restrictions 
RB =r, then imposing this constraint during estimation is unnecessary. 
The Lagrange multipliers measure the sensitivity of the minimised sum of 
squared errors to small changes in the constraint. If the constraint is 
already satisfied by the model, then small changes in the constraint would 
not affect the value of the residual sum of squares and hence the Lagrange 
multipliers would be zero. If the residual sum of squares is sensitive to 
the constraint then the Lagrange multipliers would be non-zero. In this 
case the model does not already satisfy the restrictions. 

An estimate of \ can be obtained from equation (2.5.5) as 


A = A(Rb—r) 
= AR(b — 8) + A(RB—7r) 
On Hy: RB —r=0 hence 
X = AR(b —8) 
Since E(b) = B we find that E(a) = ( and the covariance matrix of ) is 
V, = ARVR'A' 
But V = 0?(X'X)! and A = [R(X'X)' R']"', thus 
V, = 0A = oO [R(X X)'R']'. 
On H), therefore, d is distributed as a N(0, V,) variable. 


For 0? known, to test Hy: A = 0 against H, : \ #0 we can use the test 
statistic 


N'VOX = (Rb—71)'[R(X'X) 1 R'] (Rb — r)/o?, 
which is identical to Ky defined by equation (2.5.7). If o? is unknown 
and we estimate it by s* = u'u/(T — k) as above, then ey we obtain 


K, once more. On the other hand, if we estimate 0? by 6? = u'u/(T—k) 
where w are the residuals from the restricted regression, then 


I eopctid W iste a Ws 4 C509 ey (a8 Nie (Rb — r)/o? 


is asymptotically distributed as x2, .. However, whereas K /m has an if 
distribution in finite samples K3/m does not. 

The likelihood ratio test statistic can be shown to be / =(i ‘alu ‘al On 
Hp, the asymptotic distribution of —2In/ is x}, . 

The reader may wish to attempt question (2.28) which involves an 
inequality between the three test statistics, the Lagrange multiplier, the 
likelihood ratio and the Wald test statistics. 
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Solution 2.6 


Part (a). Consider the linear model 
Vt = xB + ut (aS Ls Seicecacd cll) (2.6.1) 


where B is k x 1 and the u; are independent N(0, o”). A structural 

change is said to have occurred if the model (2.6.1) is correct for a certain 
time period, sayt=1,..., 7, butnot foranother,sayt=7,+1,..., 7. 
We shall assume that in each time period the distribution of the u; remains 
unchanged. It follows that the structural change is due to a change in B 
from B,, say, to B, +. If we wish to test for the occurrence of structural 
change then we can test the null hypothesis H, : y #0. On Hy we have 


i a x By + ut (iD ae es) (2.6:2) 
or 

y = XB, tu (2.6.3) 
and on H,: 

Vie eeBy tte a(t de Ey ) (2.6.4a) 

Vite sec. x+B, af xy F Ut (t >= i Oe te if Ae 5 T) (2.6.4b) 
or 

yi = XB, tu, (2.6.5a) 

Yo = XB, + X,y + u2 (2.6.5b) 


respectively, which can be written more compactly as 
v4 XxX 0| {6 u 
y2 Ao AIL, 


U2 
Vera P ie eet ee (2.6.7) 


where X* = [0: X} ]'. Thus testing for structural change is equivalent to 
testing for the significance of the additional variables X*. The details of 
this test are given in question (2.3). For o? unknown, the test statistic 
F = AESS, /s?k has an F,, 7-2, distribution with s? = RSS/(T — 2k). 


or 


Part (b). The above test for structural change is not of use if the number of 
observations in the second time period T — T, is less than k; an 

alternative test is required in this case. To see this we can rewrite 

equation (2.6.6) as 


1 Xx 0 By uy 
ie ik =| f | [2] oe) 
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or, more compactly, as 
= XBt+u (2.6.9) 


where B, = 6, + ¥. a 

If 7, >k and T—T, >: then X,, X,, X* and X are all of full column 
rank. In this case the residual sum of squares of (2.6.9) or, equivalently, 
of (2.6.7), can be shown to be the sum of the residual sum of squares of a 
regression of y; on X, (say RSS, ) and the residual sum of squares of y, 
on X, (say RSS, ). But if T— T, <k then X,, X* and X are not of full 
column rank; in other words they are singular matrices. The regressions 
(2.6.7), (2.6.9) and y. on X, are now no longer defined. Since there are 
more explanatory variables than there are observations, y, can be 
explained perfectly by X, and hence RSS, = 0. It follows that RSS, the 
residual sum of squares of (2.6.9), reduces to just RSS, the residual sum 
of squares of a regression of y; on X,. This must now be used to estimate 
o” . Our new estimator is s} = RSS, /(T; — ). 

For 0” known, to test for structural change, we use the test statistic 
(T; —k)s{/o? which is distributed x}, -,. For 0? unknown and with 
T, Zk and T—T, 2k, our test statistic was based on the proportionate 
increase in the residual sum of squares due to omitting X* from (2.6.7). 


That is 
_ AESS, _ RSS, — RSS ; i= 


s*k RSS k 


where RSSo is the residual sum of squares from the regression of y on X. 
When 7 — 7, <k, RSS = RSS, and our test statistic becomes 


ad RSSo = RSS, : Ly = /f 
RSS, i oe T, 
which is distributed as Fy -7,,7,-r- The proof of this result is left as an 
exercise for the reader. The test statistics derived above were first obtained 
by Chow (1960) and are sometimes known as the Chow test for structural 


change. See Fisher (1970) for further discussion of F tests for structural 
change. 


Solution 2.7 


a (a). Xj, the residuals of the regression of X, on X,, are defined by 

= QX) where Q, =I — X,(X,X,) 1X4. It follows that X;'X, = 0, 
wes Xt is orthogonal to X,. As a result, the estimates of 8, and B, 
obtained from (2.7.2) are identically equal to the regression coefficients 
of y on Xj and y on Xq, respectively. To see this consider 


A = : 
5 ba i: fi ie 4 
I Bo, X,X{, X27 X2 X2y 


78 EXERCISES IN ECONOMETRICS 


=i | 
al ie 0 | fee 
0 X, Xo Xyy 


fers ine " 
(X_ Xo ie Xny 


II 


(2.7.3) 


which is the required result. 
Part (b). In order to obtain the expected value of Bo , substitute (2.7.1) 
into the expression for By in (2.7.3): 
B, = (X,X,)'X2y 
(X_X2)" X7(X1B, + X28, + u) 
B, + ite 1X2 (X1By + u) 


Hence, E(6,) = By + (X)X)!XX,6, and the bias expression required 
is (X_X)" XX, ‘By. 


Part (c). Writing equation (2.7.1) more compactly we obtain 
= XB+u, (2.7.4) 
where X = (X,: X,) and B’ = (64, By). The OLS estimator of B is 
b = (X'X)!X'y 


-1 
b PEs xx Xiy . 
ble reed leied ers 
b» XX, X,Xy X2y 


or 


where 6, and 6, are the OLS estimators of 8, and B,, respectively. Using 
the expression for the inverse of a partitioned matrix given in question 
(2.1) we obtain 


i =| 1Q2X1) 
ba = (XoXo) Nok (Or Xa) | 


(Xp Cox) Mike key 


(X2X2) 1 + (X2X2) 1 X2Xy (X1Q2X1) 1) X1X2(X2X2) IL Xay 


S. 


(2.7.6) 
Solving (2.7.6) for b; where Q, is defined in part (a), we find that 
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by = (X1Q.X1)* Xi Qy (2.7.7) 
But since Q, is idempotent, (2.7.7) can be written as 
by = (X1Q,Q.X1) 1 XM Qy 
= (X, Xi) 1 XT'y (2.7.8) 
Equation (2.7.8) is the same as the expression for B, in (2.7.3) where 
xT = Q, X,. Comparing 6, defined by equation (2.7.8) with B, defined 
in equation (2.7.3) we see that the two expressions are identical. Hence 


b,, the OLS estimator of 8; obtained from (2.7.1), is identical to B,, the 
OLS estimator of B, obtained from (2.7.2). 


Solution 2.8 


Part (a). The Gauss—Doolittle pivotal condensation method for inverting 
a matrix operates as follows. Consider the n x n matrix A with elements 
a;;. An iterative scheme is performed in which in each iteration a new 
matrix is formed from the matrix obtained in the previous iteration. For 
the first iteration we form the new matrix B from the matrix A as follows. 


1. Choose an element on the leading diagonal of A. Let this be a,;,. This 
acts as a pivot. 


2. In forming the elements of the matrix B from the elements of A we 
must distinguish between four types of elements 
(i) the pivotal element of B is formed as 


Orr = 1/dnr 
(ii) elements in the same row as the pivot are 
Dri = Ani Orr (1 #k) 
(iii) elements in the same column as the pivot are 
bin = ~ Gr /Anr (7 #R) 
(iv) the remaining elements are 
bis = ij — iGin Ger (1,7 #R) 


For the second iteration we construct a new matrix by performing steps 1 
and 2 on the matrix B obtained in the first iteration. Iteration is continued 
until every element on the leading diagonal has been used once as the 
pivotal element. Thus there are in all n iterations. It is best to choose the 
largest unused diagonal element as the pivot. It is not necessary to work 
in sequence from 1 to n. At the completion of the nth iteration the 
resulting matrix is A~!. 

To illustrate the use of this method in computing OLS estimates of the 
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model y = XB + u, first we set up the cross product matrix of all the 


variables. Denote this by A. Thus 
ae Xi aXe 
- | as | (2.8.1) 
ayy 


Next we perform one iteration of the pivotal condensation method with 
X'X as the pivot. Note this is a matrix and so the operations 2(i)—2(iv) 
above must be adapted to matrix notation accordingly. The result is the 
matrix B: 


- ieee (X'X) 4 X"y 

Lay X(X'XY! oy'y — y'X(X'XYEX'y 
Beep 

=| ie ‘| | (2.8.2) 
By By 


Now B,, is a column of regression coefficients; it is the OLS estimator of 
6. By. is a scalar and is the residual sum of squares from the regression 
and B,,, when multiplied by B,, /(T — ), is the estimated covariance 
matrix of the OLS estimates, i.e. 


ie b | 
B= 
= Ik s- 


where b is the OLS estimator, V is the covariance matrix of b and 
SoD |( Tak). 


Part (b). It is possible to introduce a subset of variables instead of all the 
variables in X by partitioning X appropriately and using as the pivotal 
matrix the cross-products of the chosen variables. Additional variables, or 
blocks of variables, can be introduced by pivoting on these variables 
subsequently. The advantage of this computational design for stepwise 
regression is that it is possible to calculate the reduction in the residual 
sum of squares due to adding the extra variable(s), and test to see whether 
or not this is significant without having to compute a complete set of 
estimates and the corresponding covariance matrix of estimates. Thus 
initially only one number has to be calculated at each step. If that variable 
is found to be significant then the remaining elements in the tableau are 
calculated. ; 

In effect, therefore, a variable is introduced into a regression by pivoting 
on the corresponding element in the leading diagonal of the cross-product 
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matrix A. It is useful to note that it is also removed from the regression 
by pivoting again on that element. 


Part (c). For the model given, the matrix A corresponding to (2.8.1) is 


Mi iky leXy Molen Ay 
0 ei yp Crane CP, aw oa) (2.8.3) 
yX, yX, yy 


Introducing first X, we pivot on X;X, giving 


| | = 
Se : | os X41 X2 c eps Ee 


| x 
(X1X1)" | (X1X1) ooh es, (X1X1)' Xy 
gp ay Pe eas Fo eg [oe seepee Ney ake nae (See ie a a a 
= | -X2.X1(X1X1)" | X2QiX2 | X2Qiy (2.8.4) 
Pa See ee 7 i) ee oe i) Pye ey ee | 
—y'X1(X1X1)" | y'QuX , y Quy 


where Q, =1— X, (XX, )! Xj. Denoting the zjth elements of A and B 
by A,; and B;; respectively, where A is given by (2.8.3) and B by (2.8.4), 
we find that By, = (X;,X,)!X;¥ is the regression coefficient of y on X, 
and B33 is the corresponding residual sum of squares. The difference 
A 33 — B33 is the explained sum of squres due to X,. 

Next we introduce X, by pivoting on B,,, obtaining 


| 
| 
| 
c= Beer. tn | 
| 
| 


=~ - | — , / 
—(X, X10 X41 X9(X2Q1X2)* » (1X1) oe (Xa) X14 X2(X2Qi Xo)" 1X, Q1y 


Fe: Ts COR ee ee ae | ; = 
—y'QyX2(X2Q1X2)"! | Y Qry —y'Qy X2(X2Q1X2) 1 X2Qry 
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(x'x)y} RAY 


Ye me 3 coe eee (2.8.6) 
W(X X) a ey Yay A Ny 


where X = (X,: X,). Equation (2.8.5) is tdentical to (2.8.2) since all of 
the explanatory variables have now been included in the regression. The 
expression y'Q, X, (X4Q,X,) 1X4 Qyy is the additional explained sum of 
squares due to X2. Clearly, it can be calculated before we calculate the 
other elements in the matrix C. 


Part (d). First we introduce x; by ‘pivoting’ on the first element of the 


moment matrix. We obtain \ 
t 
=e | 
713° 6 


Next we can introduce x by pivoting on the second element in the 
leading diagonal. However, since we only want the reduction in the sum 
of squares due to introducing x2 we need only calculate one element of 
the new matrix, namely the last. It is 6 — (3 x 3)/3 = 3. The F test 
appropriate for testing Hp :a, B; #0, B. = 0 against H,: a, B,, B, #0 is 
given by equation (2.3.5). It is 
RSSo Ca RSS, y Ln Sn 
RSS, n 

Oso mrinrs 2inel 

3 1 
= 9 


Since F is distributed as F,9 we can use the t-distribution with 9 degrees 
of freedom. The t-statistic is F = 3. This is significant at the 2% level. 
To test the significance of the coefficient of x, we first introduce x, to 
obtain 


Oe, aU.c0. 30 oO 
0.25 0.25 9 il 25 
Rae set B45 Retr fs) 
Next we introduce x, using the first element as pivot. The new last 


element is 3.75 — (0.757 )/(0.75) = 3. The t-statistic we require is, 
therefore, 
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1/2 

1 A Wat. 

A ae 
) 


= 1.5 
This is significant at the 10% level. 


Solution 2.9 


Part (a). First we rewrite equations (2.9.1) and (2.9.2) more compactly as 


y = X,B, + X28, tu (2.9.5) 
and 
Merida Bil + 1 ; (2.9.6) 
respectively, where 
y1 Vi a | 1 x4 
be nia = Ree, Gy tar tes 24 a oe , 
YT by Xr i x7 
uj, uj 
u= and u* = 
* 
UT ur 


y* and Xj are now the residuals of the regression of y on X, and X, on 
X,, respectively. Thus y* = Q,y and X} = Q,X, where Q, =Ip — 
se ioep ayer 

The least squares estimators of B, and 6, from (2.9.3) are 


fal = ened ait] Be 

b> XX, X,X, Xyy 

Using the results of question (2.7) we find that 
by = (X1Q.X%1)'XiQy 


é 
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(X1Q2.Q.%1) 1X1 QQ y 
(XT XT) 1 Xy'y" (2.9.7) 
But (2.9.7) is the expression for the regression coefficients in the 


regression of y* on Xj. We have shown, therefore, that the OLS estimator 
of B, obtained from (2.9.1) is identical to that obtained from (2.9.2). 


Part (b). In general we can obtain an equation like (2.9.4) from (2.9.3) by 
eliminating a set of variables through projecting the variables in the 
original linear model onto the space orthogonal to the range space of the 
variables to be eliminated. This is accomplished by pre-multiplying the 
original model by the appropriate projection matrix. Equations (2.9.3) 
and (2.9.4) can be written in matrix form as (2.9.5) and (2.9.6), 
respectively. To reduce (2.9.5) to (2.9.6) we pre-multiply (2.9.5) by 

Q, =1— X,(X,X2) 1X; to get 


Qoy = Q.X1B, + QrX2B, + Qou (2.9.8) 
Now Q, X, = 0, therefore (2.9.8) becomes 
Se CH cue (2.9.9) 


where y*, X; and u™ are defined as part (a) above; we note that y* and 
Xj, residuals from the regressions on X, can also be interpreted as 
detrended series. 


Part (c). The test statistic for Hy: 6; = 0 from (2.9.7) is a special case of 
a test for the significance of a subset of regression coefficients. If in 
equation (2.9.5) X, isa Tx k, matrix and X, isa Tx ky matrix, then 
the required test statistic for Hy: B; = 0, B, #0 against H,: B,,B, #0 is 
given by equation (2.3.5) as 


RSS — RSS, , De, ca k, (2.9.10) 
RSS, hy 


where RSSo is the residual sum of squares on Hp and is obtained from 
the regression of y on X7, and RSS, is the residual sum of squares on H, 
and is obtained from a regression of y on X, and X,. Thus RSSp = y Jee, 
and RSS, = y’ [IJ — X(X’X)' X’] y, where X = (X,: X,). F is distributed 
as I’, 7-r,-r, - In the present case, equation (2.9.1),k,; =k, =1. 

The test statistic for Hp: 8B, = 0 against H,: 6, #0 from (2.9.2) isa 
special case of the test of significance of all of the regression coefficients. 
From question (2.2) the required test statistic in the present case is 

F = ESS/(s?k, ) (2:9-11) 


\ 


i= 


where ESS is the Gene sum of squares in the regression of y on X, ; 


s* is an estimate of 0? obtained on H, and X, is assumed to be a net x ky 


matrix. Thus 
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ESS = y"'X7(X1'X1) 1 Xy'y" 
= y'Q,X1(X1Q.X, y'XiQy (2.9.12) 
and 
sore RSS i(T = kt) 


(5 Sen EOS) / (7 =e) 
(y'Qoy — y'Q2X1(X1Q2X,) 1X1 Q2y)/(T* — ky) eas ie 


where RSS} is the residual sum of squares from the regression of y* on 
xt and T™ is coos to equal T— k, because y* ‘y* [o? is distributed 
XP-k, and not xz. F, defined in equation (2.9.11), is distributed as 
F,, r+, - In the case of equation (2.9.2), k; = 1. 

Comparing the bottom right-hand element of equation (2.8.5) with 
that of equation (2.8.6) we can see that the residual sum of squares from 


the regression of y on X, and X, is 
y I X(XXYTX ]y = y'Qay — y' Qn Xi (Xi X11) Xi Qy 


(2.9.14) 
Hence, from (2.9.14), 
RSSp — RSS, = y'Q.y —y'[I—X(X'XY 1 X']y 
= y'QX1(X1Q.X1)'XiQy 
and so (2.9.10), the test statistic using (2.9.1), becomes 
'QX4(X10,X%,) ' X; Tea ok 
= y Q2X1(X1Q.X)) 1Qoy ‘ 1 2 (2.9.15) 


y'Qoy — y'Q2X1(X1Q2.X) 1 Xi Qy ky 
If we substitute (2.9.12) and (2.9.13) into (2.9.11), the test statistic 
using (2.9.2), we obtain once more equation (2.9.15). The two test 
statistics are clearly identical. 


Solution 2.10 


Part (a). The coefficient of determination from the first regression is given 
by 


where y is the sample mean of y,,...,yr 
te ON Poy Ip hat ee a Peet 
and the B; ({=0,.., + 1) are the estimated coefficients in the regression 


which minimises the sum of squares 
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T 
O(Bo.; Bie es Ces) % (¥2— Bo Bixit—-- Pena Xeeit) 
Thus the inequality > 
Q(Bo Br, +++ Bet) <Q(Bo>Bis-+- Beer) (2.10.1) 
is satisfied for all values of By, Bi,.--, Brai- 


From the second regression we obtain 


ye =" yp = Bo Pra No Pex Ft 


However, 


T i a a 
De = Q(Bo; Bi, -s- Bp, 9) 
which from (2.10.1) is greater than Q(By, B,,-- +, Be+1)- Hence 
sae ir 
X ur 2 ee 
and, therefore, R{ > R} as required. 


Part (b). To find an exact relation between R?_and R3 we must examine 
the residual sums of squares D7_, u? and D7, u? more closely. First we 
introduce the notation 


cee ; 1 ae i= eral yi an 
Lo oxy Xer Xe+1,T YT 


and 

Z = [Xx] 
Then 

oh cay, 

as = y'(I—P,)y and X ub = y'(I—P,)y 
where 


Pi = "Z(QZ) 22 and. Poo X(X x ae 
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Uy Ul “1 {/ 
Max AX ‘ ie | 


' 
x 


Pi = xia | 


a ae 8 
which, from the inverse of a partitioned matrix (see the solution to 2.1 


above), is 


(RR! + (XIXY EX xBo!X(X'XY 5 (XIX) XB 
cho cer PBB Pears RO Ra Ten epee, ce ane  a ea 
ES 


where B is the scalar 
Bye tx (X(t (2.10.3) 
Thus, expanding (2.10.2), we obtain 
Py = X(X'XY!X' + [[— X(X'XY X"] xBx' [I — X(X'X) 3X] 
= P, + W, say. 
Now, setting Q = 1 — X(X'X)! X’, we have 


W = QxBx'Q = —- Qxx'Q 
pO be 
in view of (2.10.3). We note that x'Qx > 0. For, if this were not so, Qx 
would equal zero and we could write x as a linear combination of the 
columns of X in which case Z'Z would be a singular matrix and the first 
regression would be impossible. 
Writing my, = Ufa (yz — ¥)*, we now have 


R? = Nes een 
Myy 
pi wie Nell at oy s ao Wy 
Myy My y 
'W 
= R342 
Myy 
But 
1 (y'Qx)? 
'W oN, ’ nee! = Sa >0 
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so that R? > R3 as shown above. However, we now see that equality 
occurs only when y'Wy = 0. That is, when y’'Qx = 0 or when y is 
orthogonal to the residuals from the regression of x on X. 


Solution 2.11 


\ 
Part (a). From y = Zb + u* and b = (Z'Z)!Z’y, we have 
“wu = ¥—LZb=y—Z(ZZ)1Z'y = [aA Z)\ eZ y 


so that 


Z'u* = ZUI—2ZZ\Zy =O (21) 
Thus 
yy = (Zb+u*)(Zb +u")=bZ'Zb4+ 0 Z'u* Fu Zh 4 
= 6'Z'Zb + u™'u* (2.11.3) 


Part (b). We take as an example the case where Z has a single column and 
there are three observations 


From (2.11.3) we see that 
pT 
yy? = b'ZZb 
t=1 


where y; is the estimated or calculated value of y, from the regression. 
Now in the usual case where there is a constant term in the regression 
(and hence one column of Z is the sum vector 2) the orthogonality 
condition (2.11.2) implies that 


Ly ES 


where y = Zb, and hence (taking that row of Z’ which is the sum vector) 


\ 


ye T 
Lye = Lye (2.11.5) 
t=1 t=1 . 
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Then, in the numerator of (2.11.4) we would have 


a ig 2 ie T 2 7 

yt =[3 94) [r= » yt a3 i [r- » (yey)? 2 0 

t=1 t=1 t=1 t=1 

But in the present case, where there is no constant term in the regression, 
(2.11.5) does not necessarily hold and consequently the numerator of 
(2.11.4) cannot be written as a sum of squares and is, therefore, not 
necessarily non-negative. In fact 


Pa Fe 42 14 
jit hie St 7) on) Mabey 
t=1(Yt—¥) 


The observations and the regression line in this case are given in figure 2.2. 


y 


y+z=1 


Figure 2.2 


The observations all lie on the line y + z = 1 and forcing the regression 
line through the origin gives a very poor fit. To see why R? as defined by 
(2.11.1) can be negative in such cases we write it in the form 
= Ltat (yt ZOE = Liat ut i Lie ye iA (Qiaialnldem Diaiut 
Dhar (Ye — 9)? Lear (te — 9)? 
a Liar yt — (Zia yt)? /T (2.11.4) 
Lies (Ye— 9)? 

Part (c). An alternative to R? in this case is the Rj defined in statistic 
question 2.1. Thus 


R?2 
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R2 a R2 Se. ps Lia Ue 
i t Lia1 9% 
Now 
Apso HY aa tists Jnohones » C 
i y'y y'y 
so that 
0= R27 = 1 
For the data in part (b) we have 
44 1 ; 
Ree ee ee 
5 25 


Part (d). The fact that R* can be negative when the constant term is 
suppressed in a regression holds even if the true model itself does not 
involve a constant. For example, if y = ZBz + wu were the true model in 
the example of part (b), the observations given in that example could still 
have been generated, viz from the disturbance sequence 


iy lL -2! §3 
Up mle 2 


Solution 2.12 


Part (a). The GLS estimator can be derived by first transforming the 
model (2.12.1) to produce a linear model with a scalar diagonal error 
covariance matrix. Then we apply the Gauss—Markov theorem to the new 
model. The resulting estimator written in terms of the original variables is 
the GLS estimator. 

Consider a T x T non-singular matrix R which satisfies R'R = D~!. One 
such matrix is obtained by writing 2 = QAQ’ where Q is a T x T matrix 
of the eigenvectors of 2, A is a diagonal matrix of non-negative eigenvalues 
of 2 and Q’Q =I. Hence 2"! = QA'Q'’ = QN "7? A1'7Q" If we define 
R=A'?Q’, then R'R =X"! as required. 

Pre-multiplying (2.12.1) by R we obtain 


Ry = RXB+ Ru 
or 
Joes : (2.12.3) 


where y" = Ry, X" = RX, u* = Ru, E(u*) = 0 and E(u*u*) = RE(uu')R' 
=RYR'=R(R'R)'R' =I. Thus (2.12.3) is a linear model whose error 
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term has mean zero and a scalar diagonal covariance matrix. 
From the Gauss—Markov theorem, (Theil, 1971, pp.119—121), the 
BLUE estimator of B in (2.12.3) is the OLS estimator 


B= (X"X*) 1X" y* = (X'R'RXY'X'R'Ry 
= OS rex yy (124) 
which has a covariance matrix 
VOSA KOME PS OCR RAY RS (XE R), (2.12.5) 


To see that (2.12.4) is a BLUE estimator of 6 consider another linear 
estimator 


b* = [(X’D TX) X'S! + D)y =B+ DXBt [(X'D AXP X'S +d) u. 
For b* to be an unbiased linear estimator we require that 
E(b") = B+ DXB = B 
implying that DX = 0. The covariance matrix of b”* is 
V* = E(b* —6)(b* — By 
= (XX) Xt Di how) [Get xX) XS + py’ 
= Nene) 4 De 
since the cross product terms (X’27!X)1X'Z"! DD = 0. As DED’ isa 
positive semi-definite matrix, it follows that choosing D = 0 minimises V* 
and makes b* = 8. See Theil (1971, pp.237—241). 
Part (b). The OLS estimator of B is 
b = (X'X)'X y =B + (XX) 1 Xu. 


Therefore, E(b) = B and the covariance matrix of 6 is 


V = E(b—6)(b — B)' = E[(X' XJ?! X'uu'X(X'X)"] 
= (X'XP' XTX(XXY!. (2.12.6) 
Let x; denote the jth column of X, (j = 1,..., 4), then if x; is an 


eigenvector of Y we have 2x; = A;x;, where A; is the jth eigenvalue of 2, or 
XX = XA, where A, is a diagonal matrix whose jjth element is A;. From 


part (a) 


= = QAQ' = (XZ) ORGY wads (2.12.7) 
Ore uelen tz: 


where the columns of Z are the remaining J — k eigenvectors of 2 and 
the diagonal elements of A, are the corresponding eigenvalues. Moreover, 


« 
‘ ¢ é 
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De SONORA a “ | vse (2.12.8) 
0 Aol ok 
and 
Q'Q z ee A = jf. % (2.12.9) 
IESE NTA b 


Consider now the covariance matrix of the GLS estimator, equation 
(2.12.5). Using (2.12.8) and (2.12.9) we have 


at 


) EEatS ee es sh ea) x 
V(X Sx) [xixi2) [4 | | |x| 


Ay! ZA 
[onl called) 
St (XX XZ) setae 
O Aj} eZ, 
= fat OU Nes zag 
eas AP al Ge. Mer ue (2.12.10) 


Using (2.12.7) and (2.12.9), the covariance matrix of the OLS 
estimator, equation (2.12.6) can be shown to be 


V = (X Xr OD XO es DX 


T T 
at 
Be 
eS 
— 
i 
SS) 
——E 
i 
ast 
—— ee! 


T 
ts 


; A, 0 I 
(I: 0) (2592.11) 
OF Ns 0 
Thus V = V and hence the OLS estimator is as efficient as the GLS 
estimator. See also Anderson (1971, pp.18—20). 


Part (c). From the results of part (b), OLS will be efficient if the columns 
of X are eigenvectors of 2 and x is an eigenvector of Y with elements 
X1,.+..+,Xp and A is the corresponding eigenvalue i.e. if 2x = Ax. The 
following results are similar to those of Anderson (1971, pp.276—293). 
From 2x = Xx we have 


OF (Rata eee \ (2712512) 
O° (—xi4 at kt Peg) = NX; (f= 233 534 21) (2702 5153)) 
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0? (—xp-+ + xp) = Ax. (2.12.14) 
Equation (2.12.13) can also be written 
Meee (AO —12)xe tix) = 0 (ee a) Pw (one) 


which is a second-order difference equation. The solution is 


Xt = ce, at + C0 (2.12.16) 
where @, and @, are the roots of the characteristic polynomial equation 
xx EM =" 0 (2.82.17) 


for 6 = 1—)o?/2. The roots of (2.12.17) are 0 +0? — 1; hence 
Oj Oly: = [6 +07 — 1)? ] [0 — (0? —1)"7] = 1 anda, +a;,°= 20. 
It follows that we can write (2.12.16) as 
A ee Cy ators ae (2.12.18) 


where Q =, = a@and 20 =a+ a". Substituting (2.12.18) into 
(2.12.12) we obtain 


OR= Gli ot) xg (20a xD 
=a wa lien ct i ope ics ae 
Seppe; cA rte ey el Cys Ce, & * 
= ¢,(1—a) +¢,(1—a") 
Therefore c, = — (1 — @)c,/(1 —a™!) =ac, and hence if we take c, = 
a~'/2 we find that c, = a!/? and 
Mer (2.0219) 
From (2.12.14) 
0 = —x__-, + (2—DXo 7 )xp =—xq-1 + (20 — 1) xz 


= rs (BEd oa spelt L) ete (x Je al = Liat 4 ab Oatuevs) 


T-3/2 143/24 gTtl/2 4 QT t3/2 4 qT-3/2 4 gP-1/2 4 gy T+1/2 


— —& 
(a? — al oe? == qi/2 


Thus either w = 1 or a2? = 1. The roots of a?7 = 1 are e227 = eits/T 
tors = 0, l,...,27 — 1. It follows that 


4 = (emee7 els eims/T )/2 = cos (ms/T) 


18% 


and hence 
A = 2o7[1 — cos (12/T)] (2-12.20) 


But as cos(ms/T) = cos(2m + ms/T), there are T + 1 different roots of » 
for which (2.12.20) holds. However, we can rule out one of the roots 


¢ . 
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since if s = T then 6 = — 1, a= —1 and x; = 0 which is inadmissable. 
The remaining \’s are the required T eigenvalues of 2. The corresponding 
eigenvectors are obtained from (2.12.19). Thus fort =1,..., T and 
$= Ores Lipo 
= Qh 2 boty cl QE1) mae. ariCt-1) rear 

x 


Xt 
= cos[(2t — 1)ms/2T] 
The sth eigenvector is, therefore, 
x t= Ycos(as/2T);'cos(3as/2T ) 1. >, cos| (20 1 as 274)3 - 


Thus, to form X we must choose k of,these T eigenvectors. 

Y as defined in (2.12.2) can be obtained if the errors u,;(t =1,..., T) 
are generated by the moving average process uy = €; — €¢-; where E(e;) = 
0, E(e?) =07, E(e,e,) = 0 fors =t andeg =e7 = 0. Fort =2,...,T—1 

B(uz ) = E(e: os. Cy-4 )? = E(e? ) = 2E (e+e 4-1 ) + E(ez.4 ) = 907 
For ii= a4, 8 
E(upur1) = Elet — e-1 )(@1-1 — €t-2) 
= E(ereét-1 ) ype terei ) TE (esr) ar E(e:-} €4-2 ) — o? 
Fors > 2, E(u,u;-;) = 0. Further, E(ut) = E(e?) = 0? and E(u4) = 
E(ep-, ) = 0”. Combining these results we obtain D as defined in (2.12.2). 

It should be noted, however, that 2 is a singular matrix (the columns 
and rows sum to zero) and hence the GLS estimator cannot be computed 
from (2.12.4). A generalised inverse of 2 must be used in place of 7!. 


But since OLS is fully efficient and presents no computational problems 
we can use this estimator instead. 


Solution 2.13 


Part (a). Let the model be written in the sample period as 


Dia 9 aaa (2.1321) 
and in the forecast period as 
yr = Xph+ up (21372) 


If yp is a linear predictor of yp then yp = Cy. It is an unbiased predictor 
if E(yp) = yr. Now 


Ey yn E(Cy — yr) 
= B(CX6 7 = Nee ue) : 
(CX aXe) P bE (Curae) 
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= ( CX Nae )p 


Thus yp is unbiased for yr if CX = Xp. The covariance matrix of the 
forecast error is 


E(yp — yr (Yr —yr) = E(Cu—up)(Cu —up)’ 
=O (CC Ip ) (2.13.3) 


if E(uuy ) = 0, i.e. if the errors are serially uncorrelated. If we estimate 6 
using b, the OLS estimator of B from (2.13.1), then an estimator of yp is 
given by 


Vr = Xpb 
Xp (XX)! X'y 
= Cy 


where C = Xp (X'X)' X’. Thus yp is a linear estimator. Moreover, since 

CX = Xp (X'X)' X'X = Xp, ye is also an unbiased predictor of yp. The 

covariance matrix of the forecast error is, therefore, given by (2.13.3). 
Consider now another linear estimator 


yr = (C+D)y 
where C = Xp (X'X)! X’. The expected forecast error of Vp is 


which is zero for all values of 8 if and only if DX = 0. 
When yy is an unbiased predictor, the covariance matrix of the forecast 
error of yp is (using the fact that CX = Xp and DX = 0) 


BCD yp sl (C re) aie 
OCU DD rie 


E(¥r —yr)(¥r —y)' 


In order to minimise this covariance matrix we must choose D = 0. Thus 
yp = Xp(X'X)! X’'y is a best linear unbiased predictor of yp. 


Part (b). Substituting C = Xp (X'X)" X’ we find in (2.13.3) that the 
covariance matrix of the forecast error of yp is 


o? [Xp (X'X) 1 Xp men lea — Xr var (bots )Xr a Od: (2.13.4) 


Part (c). A one period ahead 95% confidence interval (C, , C, ) will satisfy 
Prob[C, <yr41 —yr+1 SC, ] = 0.95 and will lead to the interval 


rei — Cy, Syrsi <yre Gy 


In the present case the forecast error yp4; — y+ is distributed as t, 
so that we obtain C; = — t7(0.025)sp and C, = t,(0.025)sy, where sp is 
the standard error of the forecast error. The required confidence interval 


« ¢ s 
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is, therefore, 

rot —t7(0.025)sp <yrar <yrai + t7(0.025)s¢ 
For the given estimated model 

Vea = 100 + 5(1) —2(2)=— 101 
and 


Spr = s[Xpay (XX) 1 Xp41 a5 1] ie 
=I 1/2 


10s Oe o0 eae 
S091 (ito) 10.220 sO le deel 
20" BOAIO| hale? 


1/2 


1.8 —0.9.—0.4] [1 
31(1,1,2)|-0.9 05 o.2ff1]+1 
—0.4 02 O.1/12 


= 3[0.1+1]'” 
3.146 


A simpler way of obtaining the expression in this example is to use the 
fact that 10 X74, is a row of X'X and hence X74, (X’X)7! = (0.1, 0, 0). 
Finally the required confidence interval is 


101 — 2.365 x 3.146 <ypay <101 + 2.365 x 3.146 
or 
93.56 <yrs, < 108.44 


See Theil (1971, pp.119—124) for further details on forecasting. 


Solution 2.14 
Part (a). If o? is known then from question (2.13) yp — yr is distributed 
N[0, 0? (I, + Xp (X'X)! Xp)]. The appropriate test statistic is then 
K.= (yr —yr) [+ Xr (XX) Xpd 7 (yr = yp) /0? 
which is distributed as x?. If o? is unknown we use instead 
ee Se 
$? |o? . 


which can be shown to be distributed as F,, 7-,. To prove these results we 
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note that Yr = Xp(X'X)1 X'y = A'y and hence that yp — yp =B'yz, 
where B' = [A’: —I,] and y, = (y’, yp). It follows that 

K Oey, Bis + aA) By. 
Ory R(B BeBe 
which is distributed as x? and B(B'B)"' B’ is an idempotent matrix of rank 
r. See question 2.2 for further details of the distribution of quadratic 
forms. 


It has already been shown (question 2.3) that (T — k)s? /o? is 
distributed as x+_,. But 


II 


be Al bd Oe Te 2g: 
Teak (0, = Obie Wea iat $s lve = 02 y4Cy. 


where C is idempotent with rank T — k. It follows that 
OBER) Bey otal ak 
which has an F distribution if B(B’B)! B'C = 0. Now 


F 


; Ape exw Xe aXe nO 
BC = |--------=--- = 
0 | 0 
and A'’X(X'X) 1X! = Xp (XX) X'X(X'X) 1X = Xp(X'X) IX =A’. 
Therefore B'C = 0 as required. 


Part (b). In testing for structural change (see question 2.6) we wish to 
know if the model in a certain time period applies in another time period. 
Consider the null hypothesis 


Ho: yz = xB + U4, uz are independent N(0, 07) (t=1,...,T+r) 


and the two alternative hypotheses 


is > =x;B, tut, wuz are independent N(0, Gene ele eee) 
- ye=xBo +uz, uy are independent M(0, 03) (t= 7T+1,...,T 
tat) 
and 
salts aceite (Giddy 25 T)) 
a Ve = x2By + uy Ce tel) 
with u, independent N(0, o*),t =1,..., 7 +r. On A,, structural change 


is due both to a change in the coefficient vector and in the variance of u;; 
whereas,eon H,, only the coefficient vector has changed. 
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Turning now to the question, the test statistic derived in part (a) which 
is based on forecast errors, is suitable for Hg against H, as it makes no 
assumptions about the alternative hypothesis. On the other hand, the 
Chow test for structural change is concerned with whether or not the 
coefficient vector has changed. Thus, the Chow test is preferred for Ho 
against H,. The test based on forecast errors could alsa be used for Ho 
against H, but if in fact 07 = 03 = 07, then this test is less powerful than 
the Chow test (see Jorgensen et al. 1970). 


Solution 2.15 


Part (a). Consider the sequence of observations iXA > Most Dap COME 
random variable X which has a distribution function F(X; @) depending 
on an unknown parameter @. An estimator 07 of @ is constructed as a 
function of the observations {x;: ft =1,..., 7}. 6, is said to bea 
consistent estimator for 0 if for all e > 0, 


lim P(|67 —8|>e) = 0 (2.15.3) 
T > : 


A 


We then write plimp..97 = 0. 


Part (b). We first state a useful theorem given in Cramér (1946, p.182): 


Tchebycheff’s theorem /f g(&) is a non-negative and integrable function of 
the random variable & then for every K > 0 we have 


Plg(&) 2K] SE[g(&)]/K (2.15.4) 


In particular, if g(&) = (— — uw)? and K = k? 0”, where yw and o? denote 
the mean and variance of &, then for every k > 0 we obtain the Bienaymé— 
Tchebycheff inequality 


P(|—E—pw| 2ko)<1/k? 


which may also be written as 


PC ee = k) <= 0? fh? (2.15.5) 
It follows that 
jim P(\E—w| Zk) = 0 (2.15.6) 


if limy — 2 o” = 0. Thus, if the estimator 6p is a random variable with 
E(@7) = 0 and limy.,.. var(87) = 0 then these conditions are sufficient 
(but not necessary) to ensure that 67 is a consistent estimator of 0. 

For the linear model (2.15.1), E(b) = B and the covariance matrix of b 
is V= 07 (X'X)'. If, therefore, limp... V = 0, then we have conditions 
sufficient to ensure that 0 is a consistent estimator of B. But 
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lim V = lim o?(X’X)"! 


Too To © 
= go? lim T"1(X'X/TY! 
Too 


By assumption limy.,.. X'X/T = M is finite, non-singular, hence 


lim V = o? limT!M = 0 


T-2 Too 


Thus 6 is a consistent estimator of B. 


Part (c) We note that b, the OLS estimator of 6 in (2.15.2), has E(b) = 8B 
and var(b) = o?/Z{(t —t )?, where t= T7! ZT t. For b to be consistent 
for B it will be sufficient to prove that limy-_.. var(b) = 0. Now 


te o? 


STi¢—iy De? — Ti? 


/ 


CE ea) 21a) eae Pace aye 


6 4 
Ios 
ner @uenny 
Hence 
lim var(b) = 0 
[+00 


It is interesting to note that for this model 


ae 
1 2 
ts et a a ee | 1 
panes Fr MS c 
‘ese 
1 
1 oer 
16 
a 1 
—>Yt —>?? 
8 
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It follows that not all elements of X'X/T have finite limits as T > ©. 
Consequently, we have used Tchebycheff’s theorem directly to prove that 
b is a consistent estimator of B in (2.15.2). 


Remark 1 The following results are useful in evaluating probability limits 
(see Dhrymes, 1970, ch3): < 
(i) If xp ts a random variable with probability limit 0 and ay ts a 
sequence of numbers with limp ...a7 =a then 
plim (apx7) = aO 
To 
(ii) If yr ts another random variable with probability limit uw and 
flyr, xr) ts a real function continuous ‘at f(u, 9) then 


TW ae oor) eee) 


This result is known as Slutsky’s Theorem. Important special cases of 
Slutsky’s Theorem are 
pam yr +xp)=ut, plim(ypxr) =yd~ 
— oo TO © 
and 
plim(y7/x7)=u/6, for 640. 
Too 


Remark 2 Frequently the probability limit of a random variable is equal 
to its expectation or to the limit as T > © of its expectation (assuming 
these expectations exist). Sufficient conditions for evaluating the 
probability limit of the random variable x by its expectation are derived 
from the Bienaymé—Tchebycheff inequality. If, for all T, E(x) = yp, 
E(xp — pw)? = 0? and limp....0? = 0 then from (2.15.5) and (2.15.6) 
plimp.x7 = E(xp) =H. 

Sufficient conditions for evaluating plimp...x7 by limp... E(x7) are 
Deas from Tchebycheff’s theorem (2.15.4). If g(x) = [xp — limp... 
E(xr )] ’ 


E[xp — lim E(xr)]° = 08, limos = 0 


T- 00 
and K = k? 0%, then for every k > 0, (2.15.4) implies that 
P(|xp — lim E(xr)| =k) < 0} /k? 
T —- © 


and hence that 
lim P(|xp — lim E(x7)| 2k) = 0 
T- 0 


To 


Thus, from (2.15.3), plimy..x7 = limp... E(xr). 
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Example The set of T observations x;,x2,...,X7 are an independent 
random sample from a distribution with finite moments up to the fourth 
order denoted by {u;:7 =1,.., 4}. Show that s? = T7! D7, (x, — x)? isa 
consistent estimator of the variance (= 07), where x is the sample mean. 

Cramér (1946, pp.345—349) has shown that E(s?) = u,(1—T7') and 
E(Ss) Sie (Ua 3h Tt (Que Suz Te + (ut, — 3p) {1 Hence 
Bis jaa but ume... 2s") = 07. 

Furthermore, 


E|se— lim £(s2)|? = E(s2 a" )? — (31) — 20" E(s? + 0" 
phil Oma ig 50 )\ ue 20" (1 Toe) 
+ Osh) (70) 
ne On) Tm sO ( 21) 


where 0(7~*) denotes terms of order T~? and smaller. Hence 
limp +0 E[s* — limp. E(s?)]* =0 and so s? is a consistent estimator of 
as 


Solution 2.16 


Part (a) Although we would like to know the exact distribution of an 
estimator for the sample size T with which we are working, very often the 
derivation of this distribution presents mathematical difficulties and is, 
therefore, not usually available. However, as T tends to infinity, we are 
often able to obtain the limiting distribution of the estimator in suitably 
standardised form. This limiting distribution of the suitably standardised 
estimator gives us what we call the asymptotic distribution of the 
estimator itself; we will explain this distinction more fully below when we 
consider which function of the estimator is required to achieve a suitable 
standardisation. Frequently, the exact sampling distribution of an 
estimator can be closely approximated by the asymptotic distribution of 
the estimator even for moderate values of T. This approximation is 
commonly used when conducting statistical tests in econometrics; and it 
also helps us to analyse and compare the properties of different estimators 
of the same parameters. 

In order to derive the asymptotic distribution of the OLS estimator for 
(2.16.1) we shall use the following central limit theorem. 


Result 1: A Central Limit Theorem (Malinvaud 1970, p.251) 
Suppose {u,;;t = 1, 2,..., T}are independent identically distributed 
m-vector random variables with zero means and finite covariance matrix 


{2. We define 
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aa 
XT =) WT 


where A; is ann x m matrix of non-random variables that are uniformly 
bounded (i.e.|A;|<R forall t and some finite R), such that 

limp... 7 127A, 2A; = Vis finite. Then the limiting distribution of 
xp is N(0, V). 


We shall also make use of the following result: 


Result 2: (Cramér, 1946, p.254, Dhrymes, 1970, pp.112—114) Suppose 
that the sequence {x7, yr3T = 1, 2,....} ts such that x, has a limiting 
distribution represented by the distribution function F(x) and yr 
converges in probability to the constant c (i.e. plimp0.yr =c) then 

xp yr has the same limiting distribution as cx, namely, F (cx). We now 
make the following assumptions: 


(2.16.4) 


Assumption 2.16.A The vector of disturbances u in (2.16.1) has elements 
u, (t=1,..., 7) which are independently and identically distributed 
with zero mean and constant variance o? for all values of t. 


Assumption 2.16.B The elements x;, of x are uniformly bounded (i.e. 
lxip|<R for all t, alli and some finite R) and limp... T~™' X'X = M ts 
finite and non-singular. 


Under the assumptions (2.16.A-B) and using Results 1 and 2 we can 
derive the asymptotic distribution of 6, the OLS estimator of B. We note 
that 


b 


(XX) X’y 
B+ (XX)? X'u (2.16.5) 


which can be written alternatively as 


Il 


VEE Bi (=) 


[ 


1 
Xu 

VT 

It follows that V T (6 — B) has the same limiting distribution, if it exists, 


as (X'X/T) | (T~'? X'u). The limiting distribution of (X'u//T) can be 
obtained by noting, in the notation of Result 1, that 


(2.16.6) 


X'u EO Wl 


ep Tae tres 


where A; = x;, the tth row of X. Since u; is a scalar random variable with 
variance 0” we have Ql = o”. Thus 
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leet i. 
lim =) 4,24) = 0? lim ye 
1 


To © Too 
re Ox ; 
O-im = or M 


Too 


which is finite. It now follows from Result 1 that the limiting distribution 
of X'u/\/T is N(0, o?M). By virtue of Result 2 above, and limp —.(X’X/ 
T)' =M"", the limiting distribution of (X’X/T)! (T~'? X'u) is the same 
as that of M4 (T7'”? X'u), namely N(0, 0? M7!'MM“™"') = N(0, 07M“). 
Thus, the limiting distribution of VT(b — B) is normal with mean vector 
zero and covariance matrix 0*M"!. 

Alternatively, we say that the asymptotic distribution of b is 
N[B, 0? (X'X)7!]. This is the terminology used by Cramér (1946, pp. 213— 
214). Recall that in question 2.15 we showed that the covariance matrix 
of 6 tends to zero as T > &. Since plimy_,..b = B, it follows from the very 
definition of a probability limit that the distribution of 6 collapses on to 
the point B as T > °%; and we say that b has a degenerate limiting 
distribution, i.e. a distribution which assigns the probability 1 to the value 
GB. In general, in finite samples, b will have a non-degenerate sampling 
distribution which we often wish to characterise or approximate. We can 
use the fact that the standardised statistic \/T (b — B) has a non-degenerate 
limiting distribution to construct an approximate distribution for b. In 
fact, the limiting distribution of V T (b — B) is normal with zero mean and 
covariance matrix which is the limit as T > © of 0? (X’X/T)' ; so an 
approximation to the distribution of b based on this asymptotic result is 
just the normal distribution with mean B and covariance matrix 0? (X'X)'. 
We call this the asymptotic distribution of b. 


Part (b) We are given that 
X'X 


line == 
Too T 


and so, using Result 1 above, the limiting distribution of V T (b — B) is 
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Part (c) We note that Assumption (2.16.B) no longer “applies as we have 
shown in question (2.15) that not all of the elements of X'X/T have finite 
limits. We shall, therefore, adopt an alternative approach: instead of 
considering the limiting distribution of the function VT (b — B), we shall 
consider that of the function (b — B)/V var(b) . Now we know that 


ih ae 2 
Cap aren and var(b) = a 
Hence 
bin Bremen eau: 
[var(b)] */? (oss (Ry 
ye LT Ayur 
where 


T(G= f) 
Dp 8) See See eS 
[or zp (e— 4)? ] 


Thus, with Q = o? we have 


i _ S(T (tty) 
lim 734 ae fee TY St —1P . 
It follows from Result 1 that the limiting distribution of (b — 6)/Vvar(b) 
istV( 05.1): 
It might be helpful to supplement the answer to this question with a 
few additional remarks on limiting distributions and, in particular, on the 
appropriate choice of the function we use to standardise the estimator. 


Remark 1: If xp defined in equation (2.16.4) fails to satisfy the condition 
limp. T7127 A,2A; = V due to the unboundedness of A;, it may be 
possible to oitain the appropriate limiting distribution by considering 
instead the limiting distribution of the function of x7 given by yp = 
0(T)xp where 6(T) is a matrix with elements dependent on T. It follows 
that 


* \ 
Ay 
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where A; = 0(T)A;. We may now appeal to Result 1 if A? is bounded 
uniformly in ¢ andlimy.... T7727 Af QA; = V* is finite, Given these 
conditions, the limiting distribution of y, is N(0, V*). We must, therefore, 
choose 6(T) to satisfy these conditions. 

Applying this result to part (c) we can show that whilst VT (b — B) has 
a degenerate limiting distribution, the function T*? (b — B) at a 
non-degenerate limiting distribution, This implies that we must choose 
6(T) = T. To see this, consider 


VT (b 6) = wae ae 


lim var(/T (b — B)) lim =)! 3 ArsuA; 


Too Too 


x To? 
too DT (t—z)? 


120? 
=r lime 
Pave ua | 

= 0 


Hence, V T (b — B) has a degenerate limiting distribution. 
Consider now 


a. 
2 (56) = 2 
whereA; =r 4(¢-—0)/2(t— 1)? and, 


ie 
lim var(T?/?(b —B)) = lim = )) A? QA; 


T 0 Too If 1 
12 
= lim T?7*— DA, QA; 
i 12G< 
peta PNG ae 
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= 120? 
It follows that the limiting distribution of T*? (b — B) is N(0, 1207). 
Remark 2 In the multivariate case the matrix 0(T) is usually a diagonal 
matrix but the elements on the diagonal need not be identical. Consider 
once more the example in part (c). Suppose we want to obtain the limiting 


distributions associated with a and b, the OLS estimators of a and 8. We 
can show that when we choose the matrix 6(T) to be 


ay) 
6(T) = li i 


the limiting distribution of V T 0(T)(a — a, b — B)’ is non-degenerate. Now 


\ 


et J ie al ee 

b—p| Ire rel Lotu, 

. ere a) 
S(t —t)u,/2(t — 7)? 


é ie Dt? —F)t/D(t = 
me beta pata ae 


Hence, 
CeaO TA, 
Vr = —_ 
a3 WPAN Te a 
where, 
i DT ae aes 
A; = T ANG) = 


Now we introduce 0(T) as defined above, so that 
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TeieT ee TOTEr 
2 tf — Tht ry(F e—ri.e)e-7 


bap S 8 OS tac Og 3) 4 emma) Deve |(tieet 2 


o? ae, re alee (ef ia 
P(t —t)|7—P eve — Ft) DE — 7)? | 


o? re sal 
—T?t T3 


T(T+1)(2T+1) —T2(T+1) 


ot 2) 6? ae 2 
Tey) hr ea) 5 
2 


Hence, 
1 ip us5 
lim 7 Eat at" = 202 | 


which, as required, is clearly finite. It follows that the limiting distribution 
of the vector (V T (a — a), T *? (b — B)) is normal with mean vector zero 
and covariance matrix 


207 : =i 
eee 6 


Remark 3 We have seen how the choice of an appropriate standardisation 
of an estimator is crucial in obtaining its asymptotic distribution. We now 
introduce the concept of a probability order operation and we show how 
this is useful in determining an appropriate standardisation. 

Suppose that zp is a random variable which does not possess a limiting 
distribution, but there exists a function ¢(7) such that z7/@(T) does have 
a limiting distribution, then z7 will be of order (7) in probability. We 
write this as 0[¢(7)] in probability, or, 0, [¢(7)]. Formally we define this 
concept as follows: zp = 0,[(7)] if for all € > 0 there exists R > 0 such 
that P[| zr | > RO(T)] <e for all values of T. 
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To illustrate, consider finding the order in probability of the mean x 
of arandom sample of T observations {x,;;t =1,..., T} drawn 
independently from a population with mean wu and variance o*. We know 
that the limiting distribution of VT (x — ) is N(0, 0). 

From Tchebycheff’s theorem and, in particular, (2.15.4), 

a \ 


7 Ee =p) pee 
ERT Werk) aa pase eee 


which implies that 
Ae 
OC ee he) So 
(|x —w|>K) Pe 
We now let € = o? /(TK?) and R = R(e) = of/e and then (2.16.7) 


becomes 
Px al > RTP ee (2-L758) 


It follows from our definition that (x — uw) = 0,(T~”), or since (2.16.8) 
can be rewritten as 


PN) Tie a ie 


VT (® — uw) = 0,(1) as T> &. Similarly, for the OLS estimator b of part 
(a) defined in (2.16.6), we can deduce that b —B = 0,(T7~7), whilst for 
that obtained in part (c) b —6 = 0,(T~*”). 

We also state the following useful result (see Mann and Wald, 1943). 
If yp is 0, [¢; (T)], zr is 0, [¢2(T)] and limp... 2 (T)/¢; (T) = 0, then 
xp =yr +2Zp is 0p[¢,(T)]. In other words, as the term zz is of lower 
order in probability than y7, it is yp that determines the order in 
probability of x7. We can, therefore, neglect zp when finding the 
asymptotic distribution of x. To see this we can write 


(2.16.7) 


xT Sone eek 


OT) Oi) Ve (1) 
and we need only verify that zp /$, (7) > 0 in probability as T > c. In 


fact, 
5 ~ (eee) (a 
Oi(T) — \b2(T)) \oi (T) 
and since zp = 0,[$2(T)] we have zr /$2 (T) = 0, (1) (i.e. it is bounded 
in probability) whereas $, (T)/¢,(T) +0 as T > ©, 
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Solution 2.17 


Part (a) Equation (2.17.3) is estimated on Ho and equation (2.17.2) is 
estimated on H, . We base our test of Hy against H, upon whether there 
is a significant increase in the residual sum of squares as a result of 
imposing the restrictions of Ho. If this increase is significant then we shall 
conclude that these restrictions provide an effective constraint on the 
parameter space {8;} and hence we shall reject Ho. If the increase is not 
significant then the restrictions appear to impose no constraint and Hg is 
not rejected. 

From question (2.3) our test statistic is 


La Rv0 Te RSSi 5. di 8 
RSS, 2 
which, since T = 22, is distributed as F, ,,. Thus 


0.6788 — 0.4277 17 
0.4277 ja 


As Fy 17(0.05) = 3.59 and Fy ,7(0.01) = 6.11 this is significant at the 5% 
level but not at the 1% level. 

It is interesting to compare this result with individual tests of B; = —1, 
B, = 0. The t-statistics are —0.49 and 1.71 respectively which are not 
significant at the 5% level [t;7 (0.05) = 2.11]. In other words, whereas we 
would accept either hypothesis individually, as a joint hypothesis we 
would reject these values for B; and B, at the 5% level of significance. 
The probable reason for this is that log Pg and log P; are fairly highly 
correlated with each other but not with log Pg, and log Y. Furthermore, 
jointly they explain a significant amount of variation in log Q but the data 
do not enable us to measure their separate influences sufficiently precisely. 


F= = 4.99 


Part (b) With this in mind we should refer to equation (2.17.2) when we 
interpret the equation further. First we should notice that neither 83 nor 
G4 is significantly different from zero even at the 10% level against a two- 
sided alternative; the t-statistics are 1.39 and 0.69, respectively. However, 
without information on var(log Q + log Pc) or R* on Ho we cannot infer 
anything about the joint significance of B; and By. 

All the coefficients are elasticities and they have the expected sign; the 
own-price elasticity is negative, the price elasticities of Indian tea and 
Brazilian coffee are positive indicating that they are substitutes for Ceylon 
tea. From the greater significance of Indian tea it appears to be the 
stronger substitute, indeed given that they are expected to be substitutes 
it is probably sensible to use one-sided alternative hypotheses. Indian tea 
is now significant at the 5% level but Brazilian coffee is not. The income 
elasticity is positive, indicating that Ceylon tea is a normal good, but is 
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less than unity and is not significant, suggesting that Ceylon tea is also a 
necessity. 

Whether or not equation (2.17.1) is a satisfactory demand function for 
Ceylon tea is more problematical — assuming, that is, that an import 
function can be interpreted in this way. We know from demand analysis 
that log-linear demand functions cannot be derived from a reasonable 
utility function except in the special case of unitary income and own- 
price elasticities and zero cross-price elasticities (see H. A. John Green, 
19:7 hy). 

The present results do not appear to satisfy these constraints. Another 
restriction from demand theory is the homogeneity condition which 
requires that the sum of the price elasticities and the income elasticity be 
zero. In equation (2.17.2) the sum is 0.143 which is small and probably 
not significantly different from zero. But without the standard error of 
this sum, for which we require the full covariance matrix of the estimates, 
we cannot perform a formal test of the homogeneity condition. Also in 
the absence of the demand functions for other goods we cannot test the 
other restrictions derived from demand theory. In conclusion, therefore, 
we may say that (2.17.1) is at best a useful empirical relationship which is 
not very well determined with the present data; it would be desirable to 
obtain a more precise set of estimates by using further data. 


Solution 2.18 


Part (a) If labour is paid its marginal product then 


3 au 
0Q “= avpA e/’ Q OPP E =a) 
OL 18 


or, rearranging, 


Meas 6 BY ie 
In < = constant + oInW — Glare Krst fanes (2/183) 
p 


where o = 1/(1 + p). Comparing (2.18.2) and (2.18.3) we can see 
immediately that o = 0.654. Let the coefficient of In Q be @ then 0 = 
— (1 — a)(1 — v)/p implies that vy = 1/[1 — 6/(1 — o)] and hence that 
y= 0924. 


Part (b) (i) The statistic appropriate for testing Hy against H, is 

Z = (0.654 — 1)/0.266 = — 1.30. Assuming that this is normally distributed, 
for we have no information on the number of observations to enable us to 
judge how good the normal approximation is, this is not significant at the 
5% level. We may conclude, therefore, that on this evidence o = 1 cannot 
be rejected and hence we cannot reject the hypothesis that the production 
function is Cobb—Douglas. 
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(ii) In order to test Ho against H{ we must first derive the limiting 
distribution of ¥ T (v — v). We can then estimate this distribution and 
perform a test using asymptotic theory. 

In deriving this limiting distribution we shall need to use an extension 
of a theorem due to Cramer (1946, pp.213—218). Consider the scalar 
function g(¢,,$2,-.-5, )' with continuous derivatives of first and 
second order. Suppose that 6 is an estimator of 6’ = (0,,0,,...,0,)' and 
the limiting distribution of VT (6 — 0) is N(0, 2). Let ¢ denote the n x 1 
vector of first-order derivatives 0g/00, then the limiting distribution of 
VT [g(61,---,6n) —8(915--+59n)] is N(0, @'¢). (See Theil, 1971, 
pp.373—375 for a further discussion of Cramér’s Theorem.) 

From part (a), 


y = 1/[1-8/(1—0)] = g(o, 8) 

Let the limiting distribution of VT (¢ — 0, 8 — @) be N(0, V). Then from 
the results of (2.18.1) and the assumed zero covariance between o and 6, 
our estimate of V is 


ba 0 | 
ji 
0 0.0403? 


It follows from Crameér’s Theorem that the limiting distribution of 
VT (v—») = VT [g(G, 9) — g(a, 8)] is N(0, $' VO) 
where 
dg Og es Oates 
val tear zi eae, 
Replacing the unknowns », o and @ by their estimates we obtain an 
estimate of ¢, 6’ = (— 0.204, — 2.468). It follows that our estimate of 


¢' Vo is 0.0128T. Thus v is approximately distributed as N(v, 0.0128T) 
and hence the required test statistic for Hg against Hj is 


Ml ead: 
OL135 
which is not significant even at the 10% level. We may conclude, 


therefore, that we cannot reject the hypothesis of constant returns to 
scale, i.e. that v = 1. 


= — 0.67 


Solution 2.19 


Part (a) Equation (2.19.3) seems to imply that investment realisations 
are greater the greater are investment intentions but that these plans can 


¢ . 
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be adversely affected by a fall in the current year’s level of capacity 
utilisation. Both of these effects are highly significant and together 
explain 90% of the variance of J. The coefficient of A is not significantly 
different from unity implying that we cannot reject the hypothesis that 
the discrepancy between realisations and plans is explained by the change 
in capacity utilisation. Equation (2.19.2) is estimated on the assumption 
that this hypothesis is true. We should note that the lower R? value for 
(2.19.2) cannot be compared directly with that for equation (2.19.3) as 
they refer to different variances; the former to the variance in the 
discrepancy and the latter to the variance of realisations. 

Having attributed the failure of anticipations to explain realisations 
exactly to short-run conditions in capacity utilisation, we turn now to the 
question of how to explain anticipations. Equation (2.19.1) suggests that 
planned investment is explained solely by the average level of capacity 
utilisation in the previous year; the greater is capacity utilisation the 
higher is the level of planned investment. The fit is remarkably good. Thus 
it would appear that firms invest to relieve the pressure on capacity. 

An implication of this result is that we can explain investment 
realisations without direct recourse to anticipations at all. Combining 
equations (2.19.1) and (2.19.2) we obtain (the reduced form) equation 
(2.19.4). The reported estimates of (2.19.4) are direct estimates, they are 
not solved from (2.19.1) and (2.19.3) and do not impose the restriction 
that the coefficient of A; is unity. We notice that the estimates of (2.19.4) 
are very similar to the solved reduced form which is 


I, = const + 54.60C,.).—19.96(C, =C,-,) (2°19.5) 


The coefficients of (2.19.4) are not significantly different from those of 
(21.975): 

In conclusion we may say that whilst anticipations can be helpful in 
explaining realisations they are neither necessary nor sufficient. They are 
not sufficient because plans are revised if there are changes in the level of 
economic activity from that prevailing when they were first drawn up. 
They are not necessary because we can explain anticipations fairly well by 
the level of capacity utilisation in the previous period. 


Part (b) There is a very simple policy implication arising from these 
results: to raise the level of investment and to keep it from falling, 
maintain a high and steady level of capacity utilisation. Failure to do so 
will cause a loss of investment. 


Solution 2.20 
Part (a) Marginal cost pricing predicts that firms select a level of output 
that equates marginal revenue with marginal cost. It is usually assumed 
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that these firms are maximising short-run profits 7. Suppose that 
Kt=fO— C(Q) (2.20.3) 


where P is price, Q is output and C(Q) is total cost, then the first-order 
condition for profit maximisation is 


pee tS 
iens oo (2.20.4) 
where 
0 P 
penta: 
0 <Q 


is the price elasticity of demand. It follows that a change in price can be 
expressed as 


aC aC 
Dy ee Nie | ene (2.20.5) 
17) 00) een) dQ 


In other words, a change in price is due to both demand and cost factors. 
We consider these in turn. 

Suppose that total cost consists of labour costs and material costs such 
that 


C(Q) = [ULC(Q) + UMC(Q)]Q. (2.20.6) 
Then, 
ee = ULC + UMC + ce [ULC(Q) + UMC(Q)] -Q (2.20.7) 
dQ dQ 
and so 


ac 0 
A oie AULC+ AUMC+A 30 [ULC(Q) + UMC(Q)]* Q 


(2.20.8) 


In part, the last term reflects changes in costs due to the level of output. 
Let this be proxied by the level of capacity utilisation. 

The influence of the demand factors operates through their effect on 
the sensitivity of demand to changes in price. One of the most important 
of these is the willingness of consumers to maintain an unfilled order in 
the face of a price rise. We postulate that an increase in the ratio of 
unfilled orders to sales will cause consumers to be less prepared to 
abandon their orders as prices rise. Thus AP is greater the greater is 
A(O/S). Our final equation for price change under marginal cost pricing 
may, therefore, closely resemble (2.20.1). 

Theories B and C are often thought to be more appropriate for 
oligopolistic firms for whom long-run costs exert a stronger influence than 
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short-run costs and demand. These theories can be treated together as full 
cost pricing is a variant of target return pricing. Target return pricing 
implies that firms cover normal or standard unit costs plus a markup to 
achieve a target rate of return at normal output levels. Thus 


p = 24 poe" + ume" \ (2.20.9) 


ON 
where 7 is the target rate of return on capital K; N denotes a normal level. 


The full cost pricing method multiplies standard unit costs by a markup 
factor A. Thus 


Pr=v (hen) (U LEY 43UMO™ r= : (2.20.10) 
Consequently theory B implies 


AP = (ar) 4" + #a( on )+auncy + AUMCY (2.20.11) 


and C implies 
AP = (1+A)(AULC’ + AUMC®) + AX(ULCY® + AUMC” ) 
(2:20°72) 


Equation (2.20.2) corresponds to (2.20.11) with Aw = 0 and K/QN 
measured by capacity utilisation whilst (2.20.2) corresponds to (2.20.12) 
with Ad = 0. 


Part (b) According to theory A, all the coefficients of equation (2.20.1) 
should be positive, with the coefficients of AULC and AUMC equal but, 
given the way the variables are measured, not necessarily equal to unity. 
We notice that all are positive and significant at the 5% level against one- 
sided alternatives. Moreover, the coefficients of AULC and AUMC are 
very similar in size. As we do not have an estimate of the covariance 
between the estimates of these coefficients or an estimate of (2.20.12) 
with the restriction imposed, we do not have sufficient information to 
perform a test of their equality, but even assuming that the two estimates 
are highly negatively correlated, thereby minimising the standard error of 
the sum, since var(a,; + a2) = var(a,;) + var(az) + 2 cov(a; a, ), it seems 
unlikely that we can reject the null hypothesis of equality. The evidence 
so far, therefore, strongly supports theory A. 

Theories B and C predict that the coefficients of A(ULC — ULC’ ), 
AULC%, and A(O/S)-, are zero. In addition theory C predicts that the 
coefficient of CU is zero. Clearly none of these predictions is supported 
by this evidence. . 

Equation (2.20.2) also has some adverse implications for theory A 
which predicts that the coefficient of AULC, should be zero and, in 
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order that ULC drops out, the coefficient of AULC™ should be minus 
that of A(ULC — ULC’ ). We see that neither of these occurs. The fact 
that AULCN is significant suggests either that none of these theories is 
correct or that some firms are oligopolistic and some are not, the results 
presented being an aggregate of these firms. The significance of AULC, 
could be due to mis-specification, either of the explanatory part of the 
equation or of the error term. 

In conclusion, whilst theory A has some support from this evidence and 
theories B and C have none, we cannot accept that theory A is an adequate 
description of pricing. It does seem, however, that both cost and demand 
factors are required to explain prices. 

An additional useful reference for this question is: de Menil (1974). 


CHAPTER 3 


The multivariate linear model 


0. INTRODUCTION 


Econometric models typically involve a system of several equations 
designed to explain movements in more than one economic variable. One 
important special case of such a system is that in which the explanatory 
variables are all exogenous and non-random. In the present chapter, our 
questions will deal with various aspects of the problem of statistical 
inference in such models. Our notation will be similar to that in earlier 
chapters where we have used y and x (or X) to represent endogenous and 
exogenous variables respectively. But we will not be able to maintain a 
uniform notation throughout the chapter. For there is no single notation 
in the multivariate model which is suitable for all problems. We will 
employ here three different representations of the multivariate linear 
model each of which is in regular use in the literature. 

Our first representation of the multivariate model is 


Ye = Axy + uy (al eetrets) (3.0.1) 


where y; is a vector of nm endogenous variables (yz: 7 = 1,...,n) observed 
at time ¢, x; is a vector of m non-random exogenous variables (xj: 
j=l,...,m),A is ann x m matrix of unknown parameters and u is a 
vector of random disturbances. The representation (3.0.1) corresponds 
with that in Malinvaud (1970b, Ch. 6). It is also commonly used for the 
reduced form of a system of simultaneous equations (see Chapter 6). 

Often there are restrictions on the parameter matrix A which we wish 
to utilise in the statistical treatment of (3.0.1); our two alternative 
representations of the model enable us to do so in a convenient way. Thus 
our second representation is the system of equations which is popularly 
known as Zellner’s seemingly unrelated regression model: 


Ym = XmBm + Um Rc Winter PF (3.0.2) 
(Zellner, 1962, Goldberger, 1964, pp. 262—265, Theil, 1971, Ch. 7) 


\ 
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where y,, is a vector of T observations on the mth endogeneous variable, 
Xm 1s aT x k,, matrix of observations of k,, exogeneous variables, B,, is a 
vector of k,, parameters and u,, is the T-vector of unobservable random 
disturbances on the mth equation. We see that the specification (3.0.2) 
allows us to incorporate directly any restrictions on the elements of the 
matrix A in (3.0.1) that arise from the exclusion of exogeneous variables 
from certain equations. However, the specification (3.0.2) does not allow 
us to take account directly of simple linear parameter restrictions across 
equations. 

A convenient way of incorporating this latter type of parameter 
restriction is used in our third representation: 


ve = Xb + uy, (ae cued) (3.0.3) 


where yy; and u; are defined as in (3.0.1), X; is a matrix of exogeneous 
variables observed at time ¢ and 6 is a vector of unknown parameters. The 
same exogenous variables may well occur in different positions of the 
array X, and some elements of X; may well be zero. The fact that the 
same elements of 6 may occur in different equations of (3.0.3) means that 
the specification allows in a simple way for across-equation parameter 
restrictions. 

To illustrate these different representations we may consider the 
following two-equation system: 


il ey ie | ‘ie 
y2t Q@21 422 Xt 
which is already in the form (3.0.1). To rewrite (3.0.4) in the form of 
(3.0.2) we would define 


+ 


a (3.0.4) 


Urt 


Yi ¥21 
a4 a1 
as ob) 2 oe »By = | aa ’ 
412 422 
MAL. er 
X11 X21 Wen AI Uy Ur, 
Nia Nee 5 ae ,u, = 
rack Y tobe wir - N47 uiT Ur 


To write (3.0.4) in the form of (3.0.3) we would define 


é 
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Xie Xo OO 0 
xX; = 
0 0 Xt X 3t 
and 
[eae 
6 = [441 412 ay ay]. 


1. QUESTIONS 


Question 3.1 

In the following model 
Vie = Ayo + yy Xy~ + AygxXye + oy ‘ 
Var = Ag HF Og X yp HF Og2X a4 + Ure 


the x; are non-random and the u;, are random disturbances with E(u;;) = 
E(ux) = 0, E(uj,) = of, E(u3,) = 03 and E(uy,uy) = 012 for all t, while 
E(uypUy,) = E(uypuo,) = E(ugtuU2,) = 0 for s #t. The sample second 
moment matrix below, in terms of deviations from means, was calculated 
from 100 sample observations: 


Vie wy Ei 2 
Ve 20) 2 3 oe 
VY B00 2 A. 
oo, ee | 
x4 owiad Wale) “GO 


(a) Find the best linear unbiased estimates of the parameters O11, 12, Xy1, 
and @ 1n the above model. 

(b) How would you estimate 07, 03 and 0,2? 

(c) If it were known that 0,, = 0 how would this affect your estimates in 


part (a)? 


Question 3.2 


The observable random vectors y;,..., yz and the non-random vectors 
X1,...,X>p Satisfy the system 
vy, = Ax, tu, Ca ae eee) (3,201) 


where A is an n xX m matrix of unknown parameters and the u; are 
mutually independent random vectors with zero mean and positive 
definite covariance matrix {2. It is assumed that T > m and that the 
matrix X’ = [x,,...,*7] has*full rank m. : 

(a) If we consider (3.2.1) as an instance of the linear model 
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Se a a (32232) 


where y' = (yi,.-., yr), u’ = (uj,..., wp) and 6 = E(y), show that 
(3.2.1) implies that @ lies in the sub-space of nT dimensional Euclidean 
space spanned by the columns of the matrix 


Ul 
I®x, 


l@xp 


(b) If A* is the least squares estimate of A and x71, is known, find the 
covariance matrix of the forecast error yp4, — A*x74). 


Question 3.3 

In the following model 
Mie = AypX ie + yaXo¢ + Uyt 
Yat = AgiX ip + XgaX ay + Ure 


the x, are non-random and the uj are random disturbances with 

E(u) = E(ux) = 0, E(uje) = of, E(uh) = 03 and E(uypu2) = 012 for all 
t, while E(w,,uy;) = E(u2,.u) = E(uy.u) = 0 for s #t. The following 
sample second moment matrix was obtained from a sample of 100 
observations: 


Vi. 22 a SED 
Yo ee 10 e 1 3 
lame | 1 
Kye FZ SUGOFI 2 


(a) Obtain the best linear unbiased estimates of 11, O42, Q1 and Q. 
(b) Obtain an estimate of the variance of aj, — a*,,, where af, and a3, 
are your estimates of Q,2 and Q, respectively. 
(c) If of, 03 and 01, were known, would it be possible to obtain any 
better estimates of the a; or any better estimates of the variance of 

ve = * 2) 
Qi2 — Aq! & 
(d) How would you test the hypothesis Ho: Q2 = Q, if it were known 
that the disturbances uj, are normally distributed? 


(Adapted from University of Essex MA examinations, 1971.) 


é 
‘ ¢ é 
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Question 3.4 
In the following model 

Vie = AypX ye F yaX% oe + Uae 

Yar = AyyX az + Agax gy + Ure 
the y;,(¢ = 1, 2) are endogenous variables and the x;;(A= 1, 2) are 
exogenous variables which we assume to be non-random. The u; are 
serially independent random disturbances with zero means and second 
moments given by 

E(u?) = 07, E(u3,) = 20% and E(uyuy,) = o? forallt 
Find the best linear unbiased estimates of @1;, Q42, @; and @. given the 
following sample second moment matrix: 

v1 v2 a | x2 x3 4 
y, 2000 500 —200 400 200 100 
V2 500 1000 P50) e200) 220 20 


x; —200 150 100 Chaco 0 
x, 400 —200 0 3000 0 
x, 200 30 0 One 20 10 
x, 100 —20 0 0 10 10 


(University of Essex MA examination, 1973.) 


Question 3.5 


In the following two-equation regression model 


2 0 X, B, ug 
yi(t = 1, 2) is a vector of T observations on the 7th dependent variable, X; 
is a matrix of observations on /; non-random independent variables, the 8; 
are vectors of coefficients and the u; are vectors of disturbances for which 
E(u;) = E(uz) = 0 and E(u,u}) — Oly, E(u2u2) = O71 7 and E(u,u4) = 
042/17. It is assumed that the matrix 


is positive definite. 
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(a) Find the covariance matrix of the Aitken (or generalised least squares) 
estimator of 6,. 

(b) In the case where X,X, = 0, compare the covariance matrix in (a) 
with the covariance matrix of the single equation least squares estimator 


of B,. 


Question 3.6 

In the following model 
Mit = Oxy, + Bry + uy 
Yar = YXar + Ue 


the y(¢ = 1, 2) are endogenous variables and the x;,(j = 1, 2) are 
exogenous variables which we assume to be non-random. The uj are 
serially independent random disturbances with zero means and second 
moments given by E(uj,) = 07, E(u3,) = 407 and E(uy,uy%) = 2po? for all 
t where |p|<1. 

(a) Find the best linear unbiased estimates of a, B and y given the 
following sample second moment matrix: 


Se Bo) co We ye 
y, 100 20 4 1 
V2 20° “150 0 0 
ms 4 OO 2 
Ks i Ora-=2 5 
x3 0 1 0 0 5 


(b) Comment on any special feature of your results. 


SS 


Question 3.7 


(a) What do you understand by the following terms: 
(i) an efficient estimator 
(ii) an asymptotically efficient estimator. 

(b) In the following model 


Vig = Oy, + yt 
Yor = Brae + YX + Ure 


the y,(¢ = 1, 2) are endogenous variables and the xj,(j = 1, 2) are 
non-random exogenous variables whose second moment matrix converges 
as the sample size tends to infinity to the following positive definite limit: 
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Mer IND 
x4 2 il 
x4 ] 1 


The uj, (2 = 1, 2) are serially independent random disturbances which have 
the same bivariate normal distribution for each value of t with E(uy;,) = 
E(u) = 0 and covariance matrix 


ieee 


with |p|<1. 
P0102 02 


Describe a procedure for obtaining asymptotically efficient estimates of 
the parameters a, B and y in the above model. Find the covariance matrix 
of the limiting distribution of your estimates and compare the asymptotic 
variance of your estimates of aw and y with that of alternative estimates of 
these parameters derived by the application of ordinary least squares to 
each equation. Comment briefly on your results. 


(Adapted in part from University of Essex BA/MA examinations, 1975.) 


Question 3.8 

In the following model 
Vit = OX ye + Bra + uy 
Yar = Brae + yx + wae 


the y(2 = 1, 2) are endogenous variables, the x;(j = 1, 2) are non-random 
exogenous variables and the uj (? = 1, 2) are random disturbances which 
satisfy the following conditions: 


E(uy:) = E(ux) = 0 
E(uj,) = E(u3,) 
E(uyuy%) = 0 
for all t, and 
E(uysuUyt) = E(unuy) = E(uy.uy) = 0 (fors # t) 


Obtain the best linear unbiased estimates of a, B and y from the following 
sample second moment matrix: 


Mais Vos ek ie wore 
VT he US ON eZ Omer GS 
V2 80 90) “25, 30 
On) 2027525 310 * 
en 45 20 5920 


2 


0 
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(Adapted from University of Essex MA examinations, 1973.) 


Question 3.9 
It is assumed that the system 
Vit = Oxy, + Bry + yt 
Yar = YX3e + Ure 
holds for all ¢ and that: 
ee 
(ii) the x;(j = 1, 2) are non-random, and 
(iii) the uw are random disturbances which satisfy the following 


conditions: 
E(u) = E(ux) = 0 
E(uz) = E(u) = 0? 
E(uyeux) = 0 
for all t, and 


E(u y5%12) 7 E(u, 2 ) a E(uUy,U422 ) =F) 


for s #t. The following sample second moment matrix has been 
computed from a sample of T observations of the vector (44, Y 245X145 X2¢5 
% 3 ) 


V1 2 EX eo x3 


sy ee a SOs Sa se 
V2 Ay ein sagas 2h a7 
x4 ge O22 1 1 
x4 Si ea eae Keele | 
x3 ee 1 4 


(a) Find the best linear unbiased estimates of a, B and y. 
(b) Find the variance of your estimate of y as a function of 0” and T. 


(Adapted from University of Essex MA examinations, 1973.) 


Question 3.10 


Expenditure by households on a set of commodities is related to 
household income by the system of equations 


Yh = MpB+ up eT Po Ht) (3.10.1) 
where 
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Yn = (Yni>--++» Yan) is a vector of expenditures by household A in a 
given period on n different commodities, 


M,, = income of household h in the given period, 


R 
| 


= (B,,...,B,) is a vector of unknown coefficients, and 
Un = (Uni,-+-+ 5 Unn) is a vector of random disturbances. 


The n commodities in the above model are assumed to be exhaustive, so 
that, for all h, 7’y, = M,, where 7 is the m x 1 sum vector defined by 7’ = 
(1,1,..., 1). The disturbances {u,: h =1,...,H} are assumed to be 
independently and identically distributed random vectors with E(u; ) = 0 
and E(u,u,) = V. je 

(a) Show that the coefficient vector 6 satisfies 7/8 = 1. 

(b) Show that V is a singular matrix and find a basis for the null space of 
V if it is assumed that V has rank n — 1. 

(c) Since V has no inverse it is suggested that the parameter vector B be 
estimated by constrained generalised least squares on a sub-set of n — 1 
equations of the model. Prove that the covariance matrix of the errors in 
this sub-system is non-singular and that the constrained generalised least 
squares estimator of B is the same whichever equation is neglected. How 
does this estimation procedure differ from single equation least squares 
applied to each equation of the system? 


Question 3.11 
An econometric model is given by the equation system 
y;, = 26; + 4; (¢ = Levee ge) 


where y; is a vector of T observations on the zth endogenous variable, 

Z = [i X] where 7’ = (1,1,..., 1) is the 1 x T sum vector and X isa 

T x m matrix of observations of m exogenous variables, u; is a vector of 
disturbances, and 6; a vector of unknown parameters. The y; are known to 
satisfy the vector equation 


2 ay = 0 (3.11.1) 
is 

for known constants a,,...,@,-. 

(a) If the 6; are estimated by an ordinary least squares regression for each 
equation of the model, will the calculated values of y; obtained from the 
regression satisfy (3.11.1)? 

(b) Does your result hold if (3.11.1) were changed to 


n 


Pa ke (oats) 


i=1 
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for some constant vector b # 0? 


(University of Essex BA/MA examinations, 1975.) 


Question 3.12 


The following model describes the behaviour of the 7th household at time 
t 


k 
Vit = DB tiget Un (ae PA ee) and Bar ihy tent) 
j=l 
(3.12.1) 


where x; is non-random, the uj are normally distributed with mean zero, 
E(uyuje) = 0 for allt, j, and E(uj,uj,) = 0 fort #s. 

(a) Under what conditions can the system of equations (3.12.1) be validly 
represented by the linear aggregate equation 


ke 
7 u BjXie + Uy (Sele nterst ) (3:1:2-2) 
I= 


where ¥_ = L7=arVie» Xje = V1 Xijze, | = VP.yuz and the £; are constants for 
all 7? 
(b) Derive a test statistic for the hypothesis B,;; = 6; for all 7, 7 in each of 
the following cases: 
(i) the 0,; equal o? for all z, j, 
(ii) the o,; are known but not equal, and 
(iii) the o;; are unknown. 


2. SUPPLEMENTARY QUESTIONS 


Question 3.13 
In the model 
Mit = G1pXqp + yrXr¢ F Ue 
Yar = AX 1p + Aa2X 2 F Ure 
it is assumed that the exogenous variables x;(¢ = 1, 2) are non-random 
and that 
(1) ay, Fay. = ayy 
34° G19. 93 
(ii) the wz (¢ = 1, 2) are serially independent random disturbances which 
are identically distributed for all ¢ with first and second moments 
given by: 


é 
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E(uy,) = E(uy%) = 0 
E(u?) == 207, E(u},) = ‘gh E(u ypu) = 6 


The following sample second moment matrix has been computed from a 
sample of T observations of the vector (y445 Yor» X1t» X28): 


2 


v1 Yo %1 X22 
Aarne 9 EEK Hak gerry 
yy =10y 20. 1 I 
x4 3 1 Lek 
Xa 1 Orr | 2 


. . . . \ . . 
(a) Find the best linear unbiased estimates of the coefficients a11, @12, @2) 


and a42- 
(b) Find expressions for the variances of your estimates of a, and a2, as 
functions of T and o?. 


(Adapted from University of Birmingham M Soc.Sc. examinations, 1977.) 


Question 3.14 

In the model: 
Vit = QyyX yp + Ay2X oe + Ut 
Vat = Az Xp + A72% 24 + 23X34 + Ure 

the x; are non-random, the u;, are normally distributed with: 
E(uyz) = E(uy) = 0 
Btui:) = 20°, E(ui,) = 07, E(tunuy) = 0 


E(uyuj,) = 0 for t1,7 = 1,2 and t #s 


A sample size T = 72 gives the following data on second order moments 


Vie boa) 
Vou cae ee 
Kime £6 2a 0 
ney oe Soa 


Ne Ae es 


(a) Calculate best linear pulstepes estimates of the coefficients a;; subject 
to the Soa Guo Q12 = 493. 

(b) If o? = 1, test the hypothesis Hy: a,, + a,. =4 against the alternative 
Ay: ay, ie #4, 
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(c) If the restriction a;, = a3 were not imposed, how would you estimate 
the coefficients? 


(University of Essex MA examinations, 1977.) 


Question 3.15 

In the model 
Mie = OX ye + Bx + Ui (3.15.1) 
Yor = Bxyt + yxy + Uy (3.15.2) 


it is known that the exogenous variables xj(j = 1, 2) are non-random and 
that 
y= ee 8, 
(ii) the w;,(¢ = 1, 2) are serially independent, random disturbances 
which have the same bivariate normal distribution for all t with first 
and second moments given by: 


E(uy.) = E(ux,) = 0 
E(u3,) = 07, E(ui) = 207, E(uyuy) = 0? 


It is also known that the second moment matrix of the x;(j = 1, 2) 
converges as the sample size tends to infinity to the limit 


od 7 
a ee a6” 
Mate) GS" 


(a) Describe a procedure for obtaining asymptotically efficient estimates 
of the parameters a and £ in the above model. 

(b) Find the covariance matrix of the limiting distribution of your 
estimates in (a) and compare the asymptotic variances of your estimates 
of a and £ with those of the two alternative sets of estimates of these 
parameters obtained by the application of ordinary least squares to each 
equation of (3.15.1—2). 


Question 3.16 
In the model 


Vit = Ay + Pry + Uy (3.06.1) 
Yor = AX + Ure (3.16.2) 
Yar = Brig + Ue (3.16-3) 


the x;,(j = 1, 2) are non-random exogenous variables whose second 
moment matrix is known to converge to the identity matrix as the sample 
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size tends to infinity. The uj (¢ = 1, 2, 3) are serially independent random 
disturbances which have the same multivariate normal distribution for all 
t with first and second moments given by 


E(uy) = E(uy) = E(usx) = 0 


E(ui:) = E(uz) = 
and 

E(ujyuj) = 0 (¢ #7) 
(a) Describe a procedure for obtaining asymptotically efficient estimates 
of wand £. 
(b) Can an asymptotically efficient estimate of a (or B) be obtained by 
neglecting one of the equations in (3.16.|—3)? Why? 


| 
Q 
N 
= 
id 
en 
| 
€ 


\ 


Question 3.17 

In the model 
Vit = AyX yp Fay2Xy + Uy 
Yat = AX yt + Ag2X%y% + Uy 


the x;,(j = 1, 2) are non-random exogenous variables and the u;,(¢ = 1, 2) 
are serially independent random disturbances with the same bivariate 
normal distribution for all values of t in which the mean vector is zero and 
the covariance matrix is the known non-singular matrix (2. The following 
tests are to be considered: 
(1) test Ho(1): ay; = @y2 = ay, = ay 
against H,(1): Ho(1) is false 

(2) test Ho(2): 

against H,(2): a,,; # a2 
(3) test H)(3): 


against H,(3): a4; = a) 


a4; = 42 


= oh Cay Gin) 


(a) Show how to construct F-statistics for testing each of the above 
hypotheses. 

(b) If F,, F, and F3 are the F-statistics for testing hypotheses (1), (2) and 
(3), respectively, verify that if the sample size is 22 then 


4a Pe =] 


2p ape lEOchn 


(3.15.1) 


(c) If F; = 2.80 and F, = 1.60 which of the three null hypotheses should 
be accepted and which rejected at the 5% level of significance. nee 
on your result. 
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(d) Show how to construct x?-statistics for testing the above hypotheses. 
Is there any relationship among these statistics corresponding to (3.17.1) 
for the F-statistics? 

(e) Is there any reason for preferring the tests based on the x?-statistics to 
those based on the F-statistics? 


(Adapted in part from University of Essex MA examinations, 1976.) 


Question 3.18 
The following model describes the behaviour of the 7th household at time 


k 
Vit = D Bixije + uit i ea) and task Pare rug bi 
1 
(3.18.1) 


where x; is non-random, E(u) = 0, E(ujzuj,) = oy for all 7,7, E(uguj,) = 
0 for t #s. The corresponding macro-model is 


k 
ro s BjXje + Te (fam 1 amet) (3.18.2) 


where Vy = Ljai Vie» Xje = VTayXjje and UW, = Vf=,u;,. Compare the 
efficiency of the generalised least squares estimators of B; forj =1,...,k 
obtained from the system of micro-equations (3.18.1) and the 
macro-equation (3.18.2) when 

(a) 0;; = 0? and 

(b) the oj are known but not equal. 

(c) Reconsider your answers to parts (a) and (b) when x; = x; for all 2, 


it. 


3. SOLUTIONS 


Solution 3.1 


The given model is an instance of the multiple equation model (3.0.1), 
that is 


Ye = Axe Huy (oA) 
Our problem is to estimate A using the sample observations ({y;, x;: 
t=1,..., 7}. We mention the important result that when there are no 


restrictions on the elements of the parameter matrix A, the best linear 
unbiased estimator of A is obtained by applying the method of least 
squares to each equation of the model (3.1.1) (Malinvaud, 1970, pp. 
205—206). 

Note that the 7th equation of (3.1.1) can be written as 


‘ ¢ ¢ 
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Vie = yx, + uy = xy; + uy (32122) 


where a; is the 7th row of A; and if we now write down (3.1.2) for each of 
the sample observations t = 1,..., T we have the system 


yi = Xa; + uj (3.1.3) 
where 
Vit Uj 
y= ,u; = and X’ Exe Xm | 
Dar: UiT 


\ 


(in this notation y; is the vector of T observations on the 7th endogenous 
variable). Applying ordinary least squares to (3.1.3) we obtain 


a= (X'X) 1 X'y; 


and this formula holds good for all values of 7 = 1,...,7. We note that 
a’ | 
A= 
a! 
so that 


ae = [ay ° can | 
and the corresponding ordinary least squares estimator of A’ is 
ABea (xX rey (3.1.4) 


where Y = [y,,...,Y,]- Note that Y is a T x n matrix of observations 
on the n endogenous variables and we can write this matrix in the 
alternative transposed form as Y' = [y,,..., yr] where y; = (44, 


Voge ere vier |e 
Tt follows by transposing (3.1.4) that 


N25) ele ae Gh 
Ak =aYE XOX) aly = (=) () 
Je T 


O06 er Ca 
Sta Sixes ve 
rae. XT ; 


ue 
Xp T t=1 
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say, and 
Eigen 2 | ee 
= adhere td line Them =e 


say, where we have used M,, and M,, to denote sample second moments 
of the data. We can then write A* in the alternative form (c.f. Malinvaud, 
1970, pp. 205—206) 


A* = M,,Mz} (3.1.5) 
and, of course, 

A® = Mz My, 
corresponding to (3.1.4). 


When a constant term occurs in each equation of the model we can 
rewrite (3.1.1) more explicitly as 


De oe bt Ax aaa 


where 0 is a vector of constants. Then the best linear unbiased estimator 
of the augmented matrix [b : A] is, from (3.1.5), 


Vom rxie ts 
eed [FMael| 2 iy | (3.1.6) 


where y = (¥;) with y; = (LF, yz)/T and X = (x) with X; = (Lf=1x;j)/T- 
Inverting the partitioned matrix on the right side of (3.1.6) and solving 
the resulting equations for b* and A* as in Solution 2.1 (see also 
Malinvaud, 1970 pp. 211—212) we obtain 


A* = M,,M3} (Sela7) 
and 

bt = y—A*k 
where 


ey ee (Ad 2) Liai(¥e —¥) (xe wetal 


(1/T) Lea (x; —%) (x¢ ax )e 


=I 
8 
l| 


Part (a) Turning now to the question, we notice that there are no 
restrictions on the coefficients to be estimated. Hence, the best linear 
unbiased estimates of the coefficients of x,, and x are [using (3.1.7) and 
the given data] : 
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ke ot ° ‘ * ie 
Q31 Oo 2 4 1 9 
1 [3 | 9a] ee 0.3 
AUG [a al eel ct 0.175 0.425 
Part (b) Estimates of 07, 03 and 0,, can be obtained in the usual way 
(c.f. Malinvaud, 1970b, pp. 209—210) from the sample moments of 


the residuals {uj:7 = 1,2;t=1,..., 7} from the least squares 
regression. Writing 


oF 0 ‘ 
Q 1 12 
: 0 | 
12 2 


we estimate 92 by 


using the formula (Malinvaud, 1970b, p. 210) 
Mie My Ma i ae eee ey 


where M,, = T7! Xf, y;y;. From the given data we have, in this case, 
a | Re 0.3 E | 
5 50 0.175 0.425 3 4 
2 be 3.2 
3.2 47.95] 
Remark We remark that Mj, is not an unbiased estimator of Q. But the 


following rescaled matrix is an unbiased estimator of {2 (c.f. Malinvaud, 


1970b, p. 210): 


* 
1 


where m is the number of exogenous variables in the model (in this case 
m = 3). . 


Part (c) The estimates of the coefficients a,;(¢ = 1, 2;7 =0, 1, 2) obtained 
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in Part (a) are the best linear unbiased estimates regardless of the value of 
02 in the present model (Malinvaud, 1970, p. 206). 

What this means is that there is no information contained in the 
specification of either equation which is useful in the estimation of the 
other. We may well have expected this to be the case when 0, = 0 and 
the errors on the two equations are uncorrelated. For then the 
endogenous variables y,; and y,, are uncorrelated and the two equations 
are in a sense quite separate. But when 0,7 # 0 the two endogenous 
variables are correlated and it is an interesting result that an ordinary least 
squares regression on the two equations separately still produces the best 
linear unbiased estimates. 

We will see in our later examples in this chapter that an ordinary least 
squares regression on the two equations does not produce the best linear 
unbiased estimates of all of the unknown coefficients when some of these 
coefficients satisfy known restrictions; and, moreover, there are some 
cases where this is true even when 0, = 0. 


Solution 3.2 


Part (a) We first write (3.2.1) complete with T observations as 


Mae Axr ur 


and note that 
Ax, = vec(Ax;) = (In @ x;) vec(A) 


since Ax, is already a vector. [In this operation we have used the vec(_ ) 
operator which we discuss fully in Appendix A and the right hand 
Kronecker product @; for a discussion of the algebra of Kronecker 
products the reader is referred to Theil (1971, pp. 303—306) and 
Dhrymes (1970, pp. 155—156)]. We then have 


If 
TAS x, 
1h uy 
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or 
y = Wvec(A) + u. (3.2.0) 


This representation of (3.2.1) shows that if 6 = E(y) as in (3.2.2) then 0 
can always be written as a linear combination of the columns of W. Thus, 
0, which is an nTJ-component vector, lies in the sub-space of 
nT-dimensional Euclidean space spanned by the columns of W (for a 
definition of Euclidean space see Halmos, 1958, p. 121). Thus, @ lies in 
the range space of W. If we call this latter space L then the dimension of 
L is given by the rank of the matrix W (Halmos, 1958, p. 90). Now 


(1) (2) (n) 


Fis ade © bani ib redeoieeh a 
Oviatt, 
0 0 Rs 
ee 0 
O sae 
W =] 0 0 x5 
cee O 
Ome x 0 
0 OVE Set pees 


and we see that this matrix will have rank nm if each of the submatrices of 
W based on the column blocks (1), (2), ..., (m) have rank m. But each of 
these column blocks will have rank less than m only if there exists a 
non-zero vector A for which 


xan = 0 (forall é = yy. FE) 
that is, only if there exists a non-zero vector \ for which 
XX=0 


But X has full rank (= m) by assumption, so that each‘of the column 
blocks (1), (2),..., (m) has rank m. Hence, W has rank nm. 
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Remark 1 Since W has full rank nm, we can always find the elements of 
A once we are given @ in the sub-space L. For we can always find a vector 
a such that 


6 = Wa 
and then 
W'd = W'Wa 


and 

a = (W'W)'W’'é 
The elements of A can now be recovered by rearranging the elements of a 
into an n x m matrix, i.e. reversing the rule by which a = vec(A). 


Remark 2 Note that ais obtained uniquely from 6 by means of the linear 
transformation (W'W)"1W’. 


Remark 3 According to the geometrical interpretation of least squares 
(see, in particular, Seber, 1964, p. 13) we know that the least squares 
estimator of @ is obtained as the orthogonal projection of the observed 
vector y onto the sub-space L in which @ is known to lie. Since L is, in this 
case, the range space of W we have, for the least squares estimator of 6, 
the vector 


6 = WWW) Wy 
(c.f. Seber, 1964, p. 14) and W(W'W)7!W’ is the matrix which projects 
vectors in n7-dimensional space orthogonally onto L. It now follows from 
Remark 2 that the corresponding estimator of a is obtained uniquely from 


6 as 

& = (W'W)W'6 = (W'W)'W'y (3.2.4) 
which corresponds to the familiar formula for the least squares estimator 
applied directly in (3.2.3). We leave it as an exercise for the reader to 


verify that & as given by (3.2.4) is consistent with the formula (3.1.5) 
given in the last solution for the least squares estimator of A. 


Part (b) We write the forecast error as 
Yra1 —¥re1 = (A —A*) X pay + Us 
Then, the forecast error covariance matrix is 
E[(yr+1 —yte1) (Yrer 9741) ] 
= E(upsstirs) + E[(A —A*) xpaers (A —A*)'] 


at least under the additional assumption that u7¥4, is statistically 
independent of u, fort <7. We take 


E(upeiurei) = 2 


136 EXERCISES IN ECONOMETRICS 


and note that 
(A —A*) xp4, = vec[(A —A*) x41] 
(I @ x41) vec(A —A*) 
It now follows that the forecast error covariance matrix is 
Q + E{(I@ x41) [vec(A —A*)] [vec(A —A*)]' Ve@xras)} 
= 01+ (Ie@xp4;) E{[vec(A —A*)] [vec(A —A*)]} Le x41) 
Q + (1 @ x741) (Q @ (X'X)!) (Le x41) (3:2.5) 


since the covariance matrix of vec(A* — A) is given by (Malinvaud, 1970, 
p. 209; Goldberger, 1964, p. 209): 


= eM = 20 (X'xX)" (3.2.6) 
where M,,. = (1/T) Ufayx,x; = (1/T) X'X and X' = [x,,...,x7]- 


Multiplying out the matrices on the right side of (3.2.5) above we 
obtain 


Q + Qe xpay (XX) Tx pay = Q [Lt xpay (XX) x pas] 


since x74;(X'X)"'xp4, is a scalar. 
Solution 3.3 


Part (a) As in question 3.1 the best linear unbiased estimates of the 
coefficients are obtained by least squares on each equation. We have 


is called F Ap 
Q3, 3p ay 3 I des 


1 


ll 
NS 
©o NO 
eas 
| 
= N 
| 
— 
aT 
II 
| 
me NO 
ho. SS 
es 


Part (b) We introduce the long vector a’ = (41, &12, &21, @2) so that the 
covariance matrix of a”, given by (3.2.6) (Goldberger, 1964, p. 209; 
Malinvaud, 1970, p. 209), is 
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Hence, using the fact that if c'a* is a linear combination of the elements 

U Uy . . . 
of a* then var(c'a*) = c'V(a*) c, where V(a*) is the covariance matrix 
of a*, we have 


var (at, —a3,) = [0,1,—1, 0] [E(a* — a) (a* —a)'] 


0 


5 ea ee 2 yi Tin 2 2 
- [—ot — 2042, Of + 042, —042 — 203, 012+ 03] 


0 


Il 


[(o? + 042) + (012 + 203)]/100 
= (of + 203 + 20,2)/100 (3.3.1) 


We now estimate the parameters 07, 03 and 0, using the sample moment 
matrix of residuals 


where uj’ = (uj, 3) is the vector of residuals on each equation. We 
calculate this matrix from the formula (c.f. solution 3.1 and Malinvaud, 


1970, p. 210) 
M*, = M,, —A*Mgy 


hes an aay 
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i ‘| k i" 1 | 
F-Laiod, Le8 0 5 
Hence, our estimate of var (af, — 31) is 
[1 + 2(5)]/100 = 11/100 = 0.11 


a 
Part (c) As in Part (c) of Solution 3.1, since there are no prior restrictions 
on the coefficients of the model, the knowledge of 07, of and 0, does 
not enable us to obtain more efficient linear unbiased estimates of the a;;. 
On the other hand, knowledge of 07, 03 and 0,2 does enable us to state 
the variance of aj, — 3; precisely as (3.3.1); whereas without this 
knowledge we must estimate this variance; and the variance of 
an estimate of (3.3.1) is always at least as great as the variance of the 
non-random true value of (3.3.1) which is, of course, zero. 


Part (d) When the disturbances u;, are normally distributed the estimates 
a; are also normally distributed and so too is the linear combination 
Qi, — &3,. Thus 
[(Oi2 — 031) — (12 — M1) ]?/var (af, — 031) 
is distributed as x? with one degree of freedom. If 07, 03 and 0,2 were 
known, then to test Hy we would compute the statistic 
(O42 — 03)? 
(of + 203 + 20,,)/100 
and reject Ho if x? > xj (a), where q is the level of significance and x? (a) 
is the tabulated percentage point of the distribution. 
If of, 03 and 0 are unknown then there is no exact test (Malinvaud, 


1970, pp. 235—236). We can instead use the same test but replace x? 
above with the statistic 


= 


(QF> 3031)" 


(of? + 20%? + 20%7)/100 Bate 
where 
at, om are 
Oisy On a 


This test is justified asymptotically because if the second moment matrix 
M,, of the exogenous variables converges to a positive definite limit as the 
sample size T > o, then Mf, is a consistent estimator of 
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2 
0; 012 
012 0% 


and the limiting distribution of (3.3.2) is x? (c.f. Malinvaud, 1970, p. 235). 
In the present case (3.3.2) is 


1 


pe ey ce 2 * 
O11 ~ 2:09> xi(0.01) = 6.635. 


Hence, we reject Ho at the 1% level of significance. 


Remark We note that in Part (c) the hypothesis concerned the coefficients 
of different exogenous variables in the two equations. In this case there is 
no exact test. But, when the hypothesis concerns coefficients of the same 
exogenous variable in the different equations there is an exact test based 
on the F-distribution and called Hotelling’s T*-test (see Malinvaud, 1970b, 
pp. 236—237). 


Solution 3.4 


The given model is a linear multiple equation model similar in form to 
(3.0.1) but in which there are restrictions on the coefficients. In fact, the 
model is a special case of Zellner’s seemingly unrelated regression model 
(3.0.2) and, therefore, can be represented by 


i Xm Bm + Um (m ee Lecce) (3.4.1) 


where the w,, satisfy E(u, ) = 0 and E(umUp) = Omnlp and the matrix 
LY = [(Omn)] is assumed to be positive definite. 
Writing (3.4.1) as the system 


y = XB+u 
where 
v1 xq 70 0 By uy 
y2 0 X 0 B Ur 
a a 4.6 Cas Da and u = 
0 0 Xu 
Yu Bur Um 


and noting that E(uu’) = Z @ Ip we see that the best linear unbiased 
estimator of f is obtained by generalised least squares and is given by 
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B = [X'(Z71@ Ip) X] 1 [X(2" elr)y] (3.4.2) 


In the given question we have M = 2 and 


ee ie o? : Ol a) 4.0717 
@ = Q = ; 
a o~ 2a F OL, 20s 
x x x x 
yi = | 11 =| dukes a= | 31 Z| 
X21 X2T Xai X4r 


Hence, using (3.4.2) we obtain 


a 

b> jr (ee WOR | o2po2Bl Seals a 

ny {+ (| ' ,| elt (3.4.3) 
Ay, LA Ari gq” 2G ie uke 


x< 


male ve ll 
iia Lata ¢ is cd Ek a 


| 
| 

| : 

BES SE) ERE eo 


1 
x eXy XoX4 T —XoV1 + X52 
200 =O 0 Orie A 150 
_|_0 600 | OF 50 400 —200 
0 0 | 20 10 (aie r 30 
0 0 110) 10 100 —20 
1/200. 0 0 0 1f—550 —11/4 
ia 0 1/600 0 0 || 1000 —5/3 
0 Oh sede Onlec- 0. bl e170 mee lereen 
0 0 -—0.1 0.2//—120 5 
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Notice that we have introduced the factor 1/T inside each of the braces in 
(3.4.3) and we have cancelled the factor 0? (which comes outside the 
matrix 2 as a common factor) in passing from (3.4.4) to (3.4.5). 


Solution 3.5 


Part (a) From Solution 3.4 and, in particular, (3.4.2), the Aitken 
estimator of (8;, B2) is given by 


LF] at heme oe ee ae 


R 21 y! 22 y! ' 
B, OO X2X, ON XX, 071 Xoy, +o”? X,y2 
where 
11 12 Bs 
pa = [ o | pS 1 O22 O12 
w a oe DS ame or 
og £G"* 031922 — 0142 |—04, cia 


and, introducing p = 042/(041022)'/*, this matrix becomes 


1 | 022 eres 


011022(1 — p”) OO Ooie Gis 


The covariance matrix of (Br. B>) is 
ss o}2X'X, | BR 
OX Xs, 07 XX, 


(c.f. Zellner, 1962, p. 351) and inverting this partitioned matrix we find 
that the top left hand block is . 


(ai)? 


o22 


= 
[ouxix, — XXX) | (3.5.1) 


[see (2.1.10) for the formula for a partitioned inversion] . This matrix is 
then the covariance matrix of B,. 


Part (b) When X,X, = 0, (3.5.1) reduces to 


Ul - 1 Ul - ' _ 
CRG QP gui (X1%1) Seni Pp.) (4141), (3.5.2) 
The single equation least squares estimator of B, is By = (X1X1) X11 
and has covariance matrix 


é 
‘ ¢ é 
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013(X1X1) 7 (3.5.3) 
When we subtract (3.5.2) from (3.5.3) we are left with the positive 
definite matrix p?(X,X,)71. It follows that when p # 0 the use of the 
Aitken (or generalised leat squares) procedure leads to an efficiency gain. 
When p = 0, the two procedures are in fact equivalent. Finally, we may 
note that when X, = X, the GLS estimator is identical to the OLS 
estimator (see Question 3.1). 


Solution 3.6 


Part (a) The given model (3.6.1—2) is, once again, an instance of (3.4.1). 
The best linear unbiased estimates of a, B\and y are given by 


abe sel INS eres ora | eal 
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TLo x1\L2p0? 40? Fay A ha 68 
Fd eal eee 
De ae P ely 
DBO XG 2 pes eto y2 
. | 1 i eal 4071 7 ies Es oe 
T40%(1—p*) 10 X41 |—2p07I> Os O° Xs 


l a 0 | | 407Ip wel K 
list DL 
T40*(1—p?)|0 XL|1—2p07tn* ofp Il y, 
| in | 40?X'X, Gale a 

sedi Oe OF Nox 


X2,> & 
| 


ip 


1 [ 407X191, —2p0?Xiy2 
Ns (3.6.3) 
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(; hi 16 
BXEADN 2 LOW 0 4 


Op) aes as 


88/184 11/23 
= "172/184 |= | 9/23 
1/5 1/5 


Part (b) We notice that 0? comes out as a common factor of the elements 
in the covariance matrix of the disturbances, viz 


| o? a4 | ealp 
2p0? 40? 2p 4 
and, as a result, 0” cancels in passing from (3.6.3) to (3.6.4). But, we 
cannot, in general, calculate the best linear unbiased estimates from 


(3.6.4) because p is an unknown parameter. However, in the special case 
of the data given in this problem, we find that 


DOR. — 0, X192 = 0 and Xay1 = 0 (3.6.5) 
so that the exogenous variables in each equation are orthogonal to all the 
variables in the other equation. This means that the terms involving the 


unknown parameter p in (3.6.4) are all zero and we can, therefore, 
proceed to calculate (3.6.4) in the present case without the knowledge of 


In fact, the estimates obtained in Part (a) are identical to the estimates 
we would obtain by applying ordinary least squares to each equation. To 
see this we need only observe that when (3.6.5)*holds, (3.6.4) becomes 


clara ee 
T 0 X4X>o T X42 


which reduces to 
esas 
(X2X2) 1 X2y0 
Remark We should emphasise that data for which (3.6.5) holds are very 


special indeed. For the variables y,, and y2, are endogenous variables and 
the two sets of equations 
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Xi. = 0 and X oy, = 0 (3.6.6) 


can be viewed as restrictions on the space of possible realisations of these 
endogenous variables. In general, of course, (3.6.6) will not hold. 


Solution 3.7 “ 


Part (a) When we use the words “efficient” and “‘asymptotically efficient” 
in the context of an estimator (or estimation procedure) we mean that 
this estimator has a particular property or set of properties (relating to the 
degree of concentration its distribution — or asymptotic distribution — 
displays about the true value) which distinguishes it from all other 
members of a certain class. We must, therefore, be specific not only about 
the property (or properties) which distinguish such an estimator but also 
the class of estimators that we are considering. 

In the case of an efficient estimator, we consider the particular class of 
estimators which are unbiased and regular in the sense that the probability 
density of the sample observations satisfies certain regularity conditions 
(Dhrymes, 1970, p. 115 and Cramer, 1946, p. 479). Then, within this 
class an efficient estimator is an estimator whose variance (or covariance 
matrix in the case of a parameter vector) attains the Cramer—Rao lower 
bound (Dhrymes, 1970, p. 124; Theil, 1971, p. 386; and Cramer, 1946, 
p. 480). 

Consider, for instance, the case where the sample observations are 
independent and identically distributed with probability density f(x; @) 
depending on the scalar parameter 0. The joint probability density of the 


sample of T observations {x,: tf =1,..., T}is then 

ir . 

PE (2 7510')t HSE (py O28 F510) (384.3) 

t=1 
say. [Note that when we treat L(_ ) as a function of @ for the given values 
of the sample x,,..., x we call it the likelihood function.] If 6 = 
O(x1,...,xp) is a regular, unbiased estimator of @ then we say that the 
0 is an efficient estimator if the variance of 6 satisfies the equality 

m 1 
va) = ee 
2 (ieee eri) j Beles 
—fF ee 
00? 


and the right hand side of (3.7.4) is known as the Cramér—Rao lower 
bound since all regular, unbiased estimators of @ have variance which is at 
least as great as the right side of (3.7.4). We notice that 


\ 


T 
In L(x1,..-,%730) = > Inf(x,; 6) 
t#1 
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and since the sample observations are independent and identically 
distributed 


OM Tey 9 0) 07 In f(x,3 0) 


when @ is a vector of parameters we have in place of the right hand side of 
(3.7.4) the inverse of the matrix 


071 IM 
R(6) = -E (ee 2a) 
0000 


and R(@) is known as the information matrix. 

When we consider examples in which the sample observations are not 
necessarily independent and identically distributed we must work 
explicitly with the joint probability density f(x,,...,x7;0) of the 
sample observations. The concept of efficiency embodied in the equality 
(3.7.4) then carries over as before but we do not get the simplification in 
the form of the right hand side of (3.7.4) that results from (3.7.5) in the 
independent and identically distributed case (see Cramer, 1946, pp. 
496—497, for a discussion of this generalisation). 

The given model (3.7.1—2) is an instance of the seemingly unrelated 
regression model (3.4.1) above, and the efficient estimator (according to 
the above sense) in that model is given by (3.4.2). This estimator is linear 
and it depends on the knowledge of the disturbance covariance matrix 2. 
To verify that (3.4.2) does, indeed, yield an efficient estimator we need 
only note that, under the normality assumption of the problem, B as given 
by (3.4.2) is the maximum likelihood estimator of 6 and the result follows 
from a theorem in Malinvaud (1970, Theorem 2, p. 175). 


Remark As Malinvaud (1970, p. 176) comments, we have in this case the 
stronger result due to Rao (1965) that the estimator is efficient in the 
wider class of all unbiased estimators. 

Efficient estimators exist only under very restrictive conditions (see, for 
instance, Cramer, 1946, p. 480) and in most cases of interest in 
econometrics we cannot expect that these conditions will be satisfied. For 
example, in the present model (3.7.1—2) we have seen in the last 
paragraph that the efficient estimator of the coefficients depends on the 
knowledge of the covariance matrix 2; but, in many cases, the knowledge 
of > is not readily available and this estimator is, therefore, not 
operational. This means that we cannot normally rely upon the 
criterion of efficiency to discriminate between different estimators. 
Ideally, we would like to be able to derive the finite sample distributions 
of the different estimators we are considering and measure directly their 
relative degreees of concentration about the true parameter value. But, 
in most cases, it is a formidable task to extract the exact form of the finite 
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sample distribution of an econometric estimator. On the other hand, 

many estimators (when suitably standardised ) do have limiting normal 
distributions (see Solution 2.16) and it is possible to compare two different 
estimators by using the variances (or covariance matrices) of their limiting 
distributions. The concept of “asymptotic efficiency” is, loosely speaking, 
based on such a comparison. 

We consider the class of consistent estimators whose-limiting 
distribution as the sample size T > © is normal [in a rigorous theory we 
require also that the convergence to normality be uniform over compact 
sub-sets of the admissible parameter space (Dhrymes, 1970, p. 129 and 
Rao, 1963)]. Then, within this class, an asymptotically efficient estimator 
is an estimator whose asymptotic variance (or covariance matrix in the 
case of a parameter vector) attains the limit of the Cramér—Rao lower 
bound. More specifically, in the case we considered earlier of independent 
and identically distributed sample observatons [leading to the joint 
density (3.7.3)], the limit of the Cramér—Rao lower bound is 


limy 2a /E[ Os nLite gees 00/002 |e 
= =1/E [871m f(x;5 6)/007] 


Note that the earlier formula for the Cramer—Rao lower bound, given by 
the right side of (3.7.4), has here been standardised with respect to the 
sample size T because we are concerned with the limiting distribution of 
./T(6 — @) where 6 is our estimate of 6. 


Remark We should not leave our discussion of asymptotic efficiency 
without some mention of the fact that there is by no means universal 
agreement about the right criteria for asymptotic efficiency. In fact, many 
different criteria have recently been suggested and the reader is 
recommended to consult the interesting articles by Rao (1960), 
Schmetterer (1966) and Wolfowitz (1966) for further details; recent 
discussions in the econometric literature are contained in Rothenberg 
(1973) and Madansky (1976). The type of criterion we have employed 
above is often referred to as BAN (or best asymptotically normal); i.e. in 
the class of estimators whose asymptotic distribution is normal we seek an 
estimator which has the smallest asymptotic variance. Several difficulties 
arise in this approach. The first is that BAN estimators do not, in general, 
exist unless we further restrict the class of estimators (usually by requiring 
uniform convergence to the limiting normal distribution, where uniform 
is taken to apply to the true position of the unknown parameter in the 
parameter space — we then talk of a CUAN estimator or an estimator 
which is consistent and uniformly asymptotically normal); an alternative 
approach which does not require us to further restrict the class of 
estimators we are considering is to accept that a BAN estimator need not 
exist everywhere in the parameter space but to show that the set of points 
in the parameter space at which we can do better is negligible (strictly 
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speaking, is a set of Lebesgue measure zero). The latter approach is 
successfully used by Bahadur (1964). 

A second difficulty with the BAN criterion is that we may well wish to 
consider alternative estimators whose limiting distributions are not 
necessarily normal; and the BAN criterion excludes such estimators as 
possible competitors. It becomes more difficult to compare the degree of 
concentration in the limiting distributions of two estimators if we have 
to deal with distributions other than the normal. The reader is referred to 
the recent article by Wolfowitz (1966) for a successful investigation of 
this problem. 

Finally, we bring the reader’s attention to the fact that most situations 
of interest in econometrics relate to the case where the sample 
observations are non-identically distributed and where we are dealing with 
a vector of parameters. The relevant statistical theory underlying the 
various concepts of asymptotic efficiency has not always been developed 
for this case. In this respect, the recent work by Barnett (1976) is very 
helpful. Barnett deals explicitly with the maximum likelihood estimator 
of a vector of parameters in the case of non-identically distributed 
observations arising from a multiple equation non-linear regression model 
(we will be considering such models in Chapter 5); and he shows how the 
proofs in Bahadur (1964) can be used to support the BAN criterion 
applied to the maximum likelihood estimator in this context. 


Part (b) Asymptotically efficient estimates of a, 6 and ¥ in the given 
model are obtained by the following two-step procedure (compare 
Malinvaud, 1970, pp. 294—295): 


(i) Estimate a, 6 and y by using ordinary least squares on each equation 
and construct the sample moment matrix of residuals from these 


regressions. 
a Ve 
2 » ui? dui 
>t = hh ied © | Pein ms AZ t=1 
$22 53 ris . *2 
Dd uruye dur 


where uty = Yat — OX 44, U2 = Yar — B*X ae — Y*X2¢, and a*, B* and 
* are the ordinary least squares estimates. 

(ii) Estimate a, B and y by Zellner’s method as in (3.4.2) above but with 
the estimated covariance matrix &* from step (i) in place of 2. 


We denote the estimates that result from step (ii) by a**, B**, y** and, as 


in (3.4.2), the Zellner estimates which do use Z by a, B, y. Then, since Pig 
converges in probability to 2 (Dhrymes, 1970, pp. 157-158), 
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a 30 a—-a 
Ry) clea iOra eamie sa lue alka X / Bis 
ep Gass 


have the same limiting normal distribution (see Solution 2.16 and 
Dhrymes, 1970, p. 165). In particular, the covariance matrix of this 
limiting distribution is 


N 
| 


-1 
Pathe 1 

] fey i! *-1 

are [x (2 @ ns| 
where 


\ 


xe 20 
xX = ’ Xi = esi eke val and 


a. ©} 
sah aoe 
= 
X219-+++ Xr 


Hence, writing 2*"! = [(s/)], we have 


Tey 
V = plim ; : 
slaide. al beer aX) Re) XyX4 
iG fb 


which, by Slutsky’s theorem (see p. 100 and Dhrymes, 1970, p. 111) is 


Xa Xe 
: ; 1 
o}! lim Gizelina bees 
T— co Too 6 
ba lon XX 
0?! lim 2! 92? Jim 2-2 
Too T Too T 


where 27! = [(o%/)]. 
With the given data, (3.7.6) becomes 


9g11 9g12 gl2 ml 
9g?! 9922 o2* 


21 Ze 22 


0 0 0 


The question concentrates on the asymptotic variance of \/T (a** — a) 
and./T(y** — y). The former is given by 
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94522 — G22 
Ce oe (022) 
V14 a 
A A 
where 
9g11  9g12 12 
A =1902! 9622 “G22 | =" 9522 [022011 — (g!2)2] 
ay ee ene? 
But 
Ligh an? eolpoto3 as fiodtisetg a 
p0,0, +03 alee hae 
1 p22 aie 
—- g!! G22 — (g12)2 hoe g!! | (3.7.7) 
so that 
o-So-7g )— (o"2)2 | = 04 (3.7.8) 
and thus 
Oa 04/2 


In the same way we find the asymptotic variance of \/T (y** — y): 


on eo 
cela 9621 9g22 [A 
4[a!!o72 eae (a*7 7] "> 


a 2072 [og —(g!2)2] 2?” 
Using (3.7.8) and the fact that 0703(1 — p”) = 1/[a'!0?* — (o')?], 
obtained by taking determinants of (3.7.7) we see that 


_ 20303 (1 —p?) 


= 203(1 —p") 
oj 


033 
Turning now to the ordinary least squares estimates a* and y*, we 
know that the variance of the limiting distribution of \/7(a* — a) is given 
by (see Solution 2.16) 
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and the variance of the limiting distribution of ,/T(y* — 7) is given by 


Sees Deal \s 
a3 Ee an | = aal(; , =O 
T- © 22 22 


We note in particular that 203 >v33 = 203(1 — p*) when p #0 but that 
o?/2 =v4,. Thus, the asymptotic variance of \/T (a* — a) and./T (a** — a) 
is the same, whereas the asymptotic variance of ,\/T(y* — 7) is greater 
than that of ,/T(y** — y) provided the errors on the equations are 
correlated. 

The reason for this is that all prior restrictions on the given system are 
contained in equation 1. When we estimate this equation by least squares 
we take account of this restriction and donot neglect any prior 
information by ignoring equation 2. On the other hand, when we estimate 
equation 2 by least squares we fail to take account of the prior restriction 
on equation 1 and these estimates are therefore not as efficient 
asymptotically as B** and y**. 


Solution 3.8 


The given model is a case of the multiple equation model (3.0.1). The 
coefficient matrix A = [(a;;)] here has the form 


“tet 


so that the model involves the cross equation restriction @12 = Q, = B. 
The best linear unbiased estimates of a, 8 and y are then obtained by 
constrained generalised least squares (see Theil, 1971, pp. 282—289; and 
for the particular example of cross equation constraints in the multivariate 
model see Theil, 1971, pp. 312—317). 

The general formula for the constrained generalised least squares 
estimator is a little complicated and it is worthwhile casting the model in 
the framework of (3.0.3) so that the constraint is incorporated and 
generalised least squares can be applied directly. We can do this by writing 
equations (3.8.1—2) in the form 


ye = X56 + uy, (3.8.3) 
where ve = (Vies Yat) uy ma (uit, Uo), 5 = (a, B, 7) and 
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The system (3.8.3) is now an example of the general linear model 
discussed in Question 2.12 and by Malinvaud (1970, pp. 289—296). We 


note that the covariance matrix of u; is given by 


and the best linear unbiased estimator of 6 is (Malinvaud, 1970b, p. 293) 
R ore Fe 
B liao 
T ¢=1 
which, from the form of 9227! = (1/07) J, reduces to 
ne oe ele 
= ; eX; 3.8.5 
(zx) (7 Exo) ou 


Now 


Y NOs ') (3.8.4) 
F t=1 


2 
Xit XitX 2 
1 < 2 2 
aan T as XoeXye X24 + Xte XyeXr¢ 


2 
0 X2tX it X 3t 


and, using the given data, this matrix becomes 


10 5 0 TO ee. 20) 
BeTO 20M | AGT Et Pesos SOE Th 
0 5 20 On ane 


Similarly, we find 


12 X1t¥ it 20 20 
il =? 
T Ss Xty = ie dL Maven SyVel — pao 4 = | 10 
=] =] 
: XatV2t 20 20 


Returning to (3.8.5) we now have 
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Oc es =| ees 70 
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575 —100 25 1[20 500/525 
i ~ 
= —— |-— — = |1100/525 
5950 | 7100 200 —50 || 70 /52 
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Remark The estimates @, 6 and ¥ obtained above are different from the 
estimates we would obtain by the application of ordinary least squares to 
each equation of (7.8.1—2). Indeed, the use of ordinary least squares in 
this way will, in general, produce two different estimates of the parameter 
B, one from each of the two regressions (the reader is encouraged to verify 
this in the present case by using the given data); and, moreover, the 
ordinary least squares estimates will not be the best linear unbiased 
estimates. We should emphasize that this is true in the present case despite 
the fact that the errors on the two equations (7.8.1) and (7.8.2) are 
uncorrelated (compare the point we made earlier in Part (c) of Solution 
3.1). Thus, when there are parameter restrictions across equations we need 
to employ some form of generalised least squares on the system of 
equations as a whole if we are to obtain the best linear unbiased estimates; 
and this is true even when the errors on the different equations are 
uncorrelated. 


Solution 3.9 


Part (a) The given system is another example of the model (3.1.1) in 
which the coefficient matrix A is constrained. As before we can proceed 
either by using constrained generalised least squares (see the references we 
gave in Solution 3.8) or by recasting the model so that the constraints are 
explicitly incorporated. Taking the latter approach we set up the system 


as 
x x a u 
bal Ls it *) |+| | 
Y 2t x3 Xa} | B Ure 
or y = X,6 + u in notation similar to that of (3.8.3). The best linear 


unbiased estimates of a and £ are then given by (Malinvaud, 1970b, p. 
293): 
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(3.9.1) 
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The second step above follows from the form of Q = o7/, which results in 


the cancellation of the (1/0*) which occurs in each of the factors on the 
right hand side of (3.9.1) above. Using the given data, we now get 


se 

> RN 

ee | 
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i 1+4 
1+4 7 


° P | 
eee ile 
1 | Pisa: SL | 15/17 | 
—-17/-5 ~=6 jf [11 16/17 |" 
Since the best linear unbiased estimator of y is y = & + 6 we have 


y = 31/17. 


aafeed 
Aa] 


Part (b) 
var(y) = var (& + B) 
var (&) + 2 cov(d, B) + var (6) 


Hence to find var(7) we need the covariance matrix of (a, B), which is 
given by (see Malinvaud, 1970, p. 293) 


rile 2 Xe 271x,) 2 AG & xix x) 


yo] Say ao re 
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var(7) = 75 (7-10 + 6) = tel 7 


Solution 3.10 


~ 


The model (3.10.1) is an example of a multiple equation regression model 
in which the dependent variables are constrained. The constraints are 


typ = My, (h = eee ee oh (3210.2) 


and result from the fact that the commodity set is exhaustive in that it 
includes every commodity including savings) purchased by households in 
the given period. Models of this type have been used in many empirical 
applications (see especially Prais and Houthakker, 1955 and Stone, 1954); 
and some of the technical details of statistical inference in such models are 
treated in McGuire et al. (1968). 


Part (a) Since z’y,, = M;, and M,, is a scalar it follows that 


t'y, = (18) M, +i'u, = My (3,053) 
and taking expectations we obtain 

16M, = M, 
so that 

‘1B = 1 (3.10.4) 


Part (b) From (3.10.3) and (3.10.4) we have 
M, +7'u, = M, 

so that 7’u, = 0. Hence 
Vi = E(u,u,)t = E(u,u,t) = 0 


and 2 lies in the null space of V (see Halmos, 1958, p. 88 for the definition 
of the term null space). Since 7 # 0 the columns of V are linearly 
dependent and V is singular. 

If the rank of V is n — 1 then the nullity (or dimension of the null 
space) of V must be unity since 


rank + nullity = number of columns of V = n 
see Halmos (1958, p. 90). But the non-zero vector # lies in the null space 


of V so that 7 is a basis of this space. 


Part (c) We let wu}? be the vector obtained from up, by deleting the ith 
component and set V = E(ufPuf’). Hence, uf? is the error vector on the 
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sub-system of the given model obtained by deleting the 7th equation. 


Let a = (a;) be any (n — 1)-component column vector for which 
a'V a=0. Then 


a’V%a = E(a'uPu'a) = E(a'uf))? = 0 
so that auf? = 0. Hence 


a'uf? + 0.u,; = 0 
or 


where a*! = (a,,@,...,4;-1,0,4;,... 54-1). It now follows that 
Va* = E(u,u;,)a* = E(u,u,a*) = 0 


so that a™ lies in the null space of V. But this sub-space is spanned by the 
vector 7 and, therefore, a* = Xi for some scalar X. Since the 7th component 
of a* is zero, \ must also be zero and thus a = 0. It follows that V™ is 
non-singular. 

The constrained generalised least squares estimator of 8 is obtained by 
minimising with respect to the elements of B the quadratic form 


Q(B) = : UP" VO-1yO 


subject to the restriction (3.10.4). 

A general proof that this estimator is invariant with respect to the 
choice of equation deleted from the system is given in McGuire et al. 
(1968, pp. 1205—1206). In the present example we note that Q (8) does 
not depend on &; since u$? has components y,; — M,; for j #7. Hence, 
the constraint 7’B = 1 can be neglected in estimation. Moreover, since the 
same regressor M,, occurs in each equation of the sub-system, generalised 
least squares is equivalent to the application of least squares to each 
equation in turn (see Solutions 3.1, Dhrymes. 1970, p. 161; Theil, 1971, 
pp. 309—310; or Goldberger 1964, p. 263). This leads to the estimates 


. H a4 H 
B; = (3 va ( 3, mon) (j # #) 
and then 


if Bl) 1-124 mt) |S (z m) 


j#i 


1 
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(3.10.5) 
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The above estimators will be the same whichever equation we choose to 
neglect initially. Furthermore, from (3.10.5) we see that B; is the same as 
the least squares estimator of 6; obtained from the 7th equation. It follows 
that an unconstrained least squares regression on each equation of the 
model leads to estimates of the coefficients which satisfy the constraint 
i'B = 1. This can, of course, be seen directly from the formula 


and the dependent variable constraint 


\ 


oy =, 


Solution 3.11 


Part (a) The given model is another example in which the dependent 
variables are constrained to satisfy a linear relation. Applying ordinary 
least squares to each equation we obtain 


6; = (Z'Z)'Z'y, 

so that the calculated values of y; are given by 
vn = Zé; = Z(Z'Z)7Z'y; 

Using (3.11.1), we now have 


y ayy; = Z(Z'Z)1Z' (s «o =U 
=1 i= 
so that the calculated values of y; do indeed satisfy the constraint. 


Part (b) If the constraint on the dependent variables were changed to 
(Balik 2)then 


aS ayy; = z2'2y-2'( ys a 
1 


i=1 i= 


IACLEM ATS 


which equals b if and only if 6 lies in the range space of Z. For if b lies in 
the range space of Z, then b = Zy for some (m + 1) — component column 
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vector y; and clearly Z(Z'Z)"!Z'(Zy) = Zy. This proves the sufficient 
condition. To prove the necessary condition we note that, if b = Z(Z'Z)7} 
Z'b, then b = Zc where c = (Z'Z)7!Z'b, so that b lies in the range space of 
ie 


Solution 3.12 


Part (a) Aggregating equation (3.12.1) over the n households we obtain 


i=1 j=1 t=! 1 
or : 

NiroTs py BieXje + Wy (Soh2.3) 
where 3 

Bit =e Brie! 2 Niel p> BisAije (3.12.4) 


and Ajj¢ = Xij¢/Xj¢ which is the proportion of x;;, in the aggregate Xj. 
Equation (3.12.2) equals (3.12.3) and hence is a valid linear aggregation of 
(3.12.1) if By = 6; for all 7, t. Two conditions are sufficient to ensure this 
holds but neither is necessary. First, if 6; = 8; for all 7,7, then Bj, = B;. In 
other words, if the micro-equations have the same coefficients, then the 
macro-equation (3.12.2) is a valid aggregation. Second, if Ajj, does not 
depend upon time (i.e. Ajj = Ai for all yt) then again B;, = B;, but now B; 
is a weighted average of the n coefficients B;; with weights A,; for all z, 7 


(i.e. Bj = L=1 Bij Aij)- 


Part (b) Equation (3.12.1) can be written for the 7th household at time ¢ 


as 
Vie = XO; + Up (POSE O, nie eee Ls ASAE) (3112.5) 
where xj = (ijt, - ++ 5 Xine), 5; = (Bi1, -- - » Bix ). Combining all households 
at time ¢ into one equation we have 
ye = Xb + uy, (Pieri 5 cb -Be 1» Le) Hepuliee dei) 


where ye = (9 peewee ie = (eyes Vos the ON Oi, 5°85 Ys 


MignI0NA. bon'D 
X, = 
‘tae ae ies 


and the u; are distributed as independent N(0, 2) variables with 
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OTe >t On 


Ont) = | Onn 


We can now write the null hypothesis B;; =.6; alternatively as 6; = B for all 


t=1,...,n where B’ = (B,,..., By). In this case equation (3.12.6) can 
be rewritten as 

y¥, = Z,B + uy; (aes Fie pay &, (2.4250) 
where Z; = (x4,;.--» Xnt)- Thus on the null hypothesis, Hy, equation 


(3.12.7) is correct but on the alternative hypothesis, H,, we require 
(3.12.6). 7 

For all three cases (i)—(iii) we can use a likelihood ratio test (see pp. 
65—6). The log likelihood function can be written in each case and for 
each equation as 


Ee 
B= —int ln 27) T init — > a, > a2 (3.12.8) 
t=1 
On Hp we obtain 


af 
Lo an qr Gin p leone ie » (yt Zab) yo (yt = Z+6)/2% (3 12.9) 
t=1 


and on Hi, 
it 
je =ct T In lesa re art » OF — X50); > (yt saa Pen NP (3.12.10) 
t=1 
where c = —4nT In (2m). The likelihood ratio test is then given by 
R20, Ss) (3.12.11) 


where Lo is Lo evaluated using the maximum likelihood estimator (MLE) 
of B (and 2 where appropriate) and L, is L, evaluated using the MLE of 
5 (and). . 

The limiting distribution of / is a x}, -) distribution since 6 has nk 
elements and 6 only & elements. 


Case (i) In this case Y = 077. The MLE’s of 6 and 6 are thus the OLS 
estimators 


r Uy i, Le 
b = 212.) 2 Ze¥t (3.12.12) 


Q 
ll 


and ae 
a ! ua re 
( a xiX,} » Xr (Sel) 
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The MLE of 0? on Ho is s§ = DF, ayti;/nT, where ti, = y, — Z,b and 
the MLE of o* on H, is st = LF, uf’ ux/nT, where ux = y, — X;d. 
Substituting in (3.12.9) and (3.12.10) for the MLE’s just obtained we find 
that 


[Ly = ¢—nTInsy —nT/2 
and 
Ee et Msn, 
hence 
(i=.2nF(ln'so'— Ins, ). (3.12.14) 


Case (1) For 2 known, the MLE’s of B and 6 are respectively the GLS 
estimators 


m T leer 

pa |S eeg) ne Zi" y+ (3.12.15) 
and ; = oo 

i= (z, xE-1x, ps GS (3.12.16) 


It follows that 


a fi A A 

To = e+ Tin |>|"? 3 (eee) P(e 2 el) 

a T a 

Byes ot Tin 2] Yee ye on (ye X,6)/2> (6.1218) 
and hence 


T 
is », (220). 242 ZB) 
% y (ye — X48)’ 2" (ye — X78) (3.12.19) 
t= 


Case (111) For 2 unknown the above approach requires that we estimate 
each model obtaining the MLE’s of B and 2 on Hp and 6 and 2 on Ay. We 
then substitute these estimators into (3.12.11) to obtain the test statistic 


x n fs me eS Be 
1 = T(n|Zol—In 1Z,1) + Y (ye — 28) Zo'(ye ZB) (3-12.20) 


a (Yt — X;6) Lil (ye — X;6) 


t=1 


= T(In|Zo|—In |2,}) 
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id a Pad 


uM 
T 
a 


os 
u 
—_ 


are the MLE’s of 2 on Hp and H,, respectively. 

The estimators 8 and 6 are obtained from (3.12.15) and (3.12.16) as 
before except that 2 is replaced by Xo and 2, respectively. A 
disadvantage of this test is that the MLE’s require full iteration until 


convergence. 
\ 


CHAPTER 4 


Further aspects of the linear model 


0. INTRODUCTION 


In this chapter additional problems connected with the linear models of 
chapters 2 and 3 are discussed. The notation used is the same as in these 
previous chapters. 


1. QUESTIONS 


Question 4.1 
(a) For the linear model 
Vt = %12481 ar X42 ar ut (t — iin atiev els T) (4.1.1) 


where the variables are measured as deviations about their means, E(u;) = 
0, E(u?) = o? and E(u,u,) = 0 for t #s, show that the multiple 
correlation coefficient satisfies the equation 


eee Se 
Ree (4.1.2) 
Myy 


where s? is an estimator of o?. 

(b) Show that an estimator of the variance of b;, the OLS estimator of 8;, 

can be written 

peal Ri) Moy 
(1 —r)? mij 

where r is the correlation coefficient of x; and x2, my, and m;; are sample 


second moments of y and x,, respectively. 
(c) Show why (4.1.3) is useful in explaining the effects of multicollinearity. 


var (b;) = (¢ = 1, 2) (4.1.3) 


¢ é 
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Question 4.2 
For the linear model 
y = XPt+u (4.2.4) 


where u is N(0, 07 Jr): 

(a) Express the variance of b, the OLS estimator of B, in terms of the 
eigenvalues and eigenvectors of X'X. 

(b) Explain how this expression can be used to demonstrate the effects of 
multicollinearity. 

(c) Show how a regression of y on the principal components of X can be 
used to derive a test for multicollinearity. 


\ 
Question 4.3 


(a) Obtain an expression for the bias of the OLS estimator of 6, in the 
linear model 


y = XB, + uy | (4.3.1) 
if the true model is : 
ie GY rea. CY iby an (4.3.2) 


where X, is a T x k matrix, X_ isa T x ky matrix and E(u) = 0. 

(b) What, if anything, can be said about the magnitude and direction of 
the bias? 

(c) If the residuals from the regression of X, on X, are denoted by X*, 
prove that the least squares estimator of 6; in 


Pere Rigen tl. (4.3.3) 
is unbiased 


(University of London B Sc. (Econ) examinations, 1971) 


Question 4.4 


Consider the linear model 


y = XB, + X28. +u (4.4.1) 


where E(u) = 0 and E(uw’) = 0? J. Suppose that an extraneous unbiased 
estimator b, exists for the k, x 1 coefficient vector B, , that the 
covariance matrix of b; is V, and that b, is uncorrelated with wu. 

(a) Derive the covariance matrix of the conditional regression estimator of 
B, obtained from a regression of y — X,b, on X. 

(b) Compare the covariance matrix obtained in Part (a) with the 
covariance matrix of the unrestricted OLS estimator of 8, obtained from 


(4.4.1) 
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(c) Show how a more efficient estimator of B, than those obtained in 
parts (a) and (b) can be derived. 

(d) Explain why these results may be useful in combatting 
multicollinearity. 


Question 4.5 
The true model is defined by 

y= X1B, is as (4.5.1) 
with E(u) = 0, E(wu') = o* 17 but an investigator mistakenly assumes that 
it is 

y = XB, + X28, tv (4.5.2) 
where X, and X, are T x 1 and T x k respectively. 
(a) Prove that the OLS estimator of 6, from (4.5.2) is unbiased but 
inefficient. 


(b) If 0 denotes the vector of residuals from (4.5.2), show that 
0'6/(T — 1 — k) is an unbiased estimator of 0”. 


Question 4.6 


(a) Show how it is possible to take account of seasonal variation in the 
linear model by using dummy variables and to obtain estimates of all of 
the seasonal dummies. 

(b) The following regression was estimated using 16 quarterly observations 
(t ratios are in parentheses) 


Y, = 70.7 — 0.90 + 0.43S,, + 6.558 — 2.83Sy (R? = 0.68) 
(3.7) (0.27) (3.37) (3.40) — (3.37) 
(4.6.1) 


where S;, = 1 in the 7th quarter and zero otherwise. Estimate the seasonal 
variation. Interpret your results. 


Question 4.7 


Show that if a set of variables is deseasonalised by the use of dummy 
variables, a multiple regression of the deseasonalised data will lead to the 
same coefficients as the use of multiple regression with the original data 
plus dummy variables as additional independent variables. 
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Question 4.8 


It is thought that the relationship between y and a single explanatory 
variable x can be represented by two linear segments which intersect at 
XA 

(a) Stating carefully any assumptions that you make, describe how you 
would estimate this relationship. 

(b) Test the hypothesis of the relationship above against the alternative of 
two disjoint segments which do not necessarily intersect at x4. 

(c) Test the model of part (a) above against the alternative of a linear 
model which is a single segment. 


Question 4.9 i 
For the linear equation 
ye = xB+ uy (31533. 35 ie) (4.9.1) 


where E(u;) = 0, E(u?) = ax?, E(u,u,) = 0 fort #s and Lf. x? = T, 
prove that the OLS estimator of 6 is unbiased but inefficient and that its 
variance formula yields a downwards biased estimate of the true variance. 


Question 4.10 
Consider the linear model 


Ye = XB + uy (4.10.1) 


where x; is non-stochastic with limy_,.. T~! D7, x? = m,,., which is finite, 
non-zero and limp... T~! D721 x:x%-, = om, for all s with |a| <1. u, is 
a disturbance with E(u,) = 0 and E(u,u,,) = p'o2 for alls. 

(a) Derive the asymptotic variance of b, the OLS estimator of £. 

(b) Show that if w= p then the asymptotic efficiency of b is equal to 

(1 —p*)/(1 + p?). 
( 


Adapted from University of London examinations, M Sc. (Econ) 1970) 


Question 4.11 
For the linear model 

Me eats prd(fe rs Te eects) (4.11.1) 
with 

uy) =" piu,-) he; lp|<1 (4.11.2) 
where the e; are independent N(0, a7), show how the Cochrane—Orcutt 


estimators of 6 and p are related to the maximum likelihood estimators of 
6B and p. 
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Question 4.12 


For the following linear model 


yr = x B+u, (f= "Lyn tcl) (4.12.1) 
where X' = (x,,... X7) is a matrix of observations of non-random 
variables and limp... T~' X'X is finite non-singular and 

Uy = PUuz-4 +e; lp|<l (4.12.2) 


E(e:) = 0,E(e?) = o? and E(e,e,) = O fort #5, 


(a) Show that the probability limit of the Durbin—Watson statistic is 
Zt ii py. 

(b) Find the limiting distribution of the suitably standardised 
Durbin—Watson statistic when p = 0. 


Question 4.13 


Consider the linear model 


Mai Xx, uy 
Yo] = [Xa] B+ Ju, (4.13.1) 
¥3 X3 U3 


where y3 represents missing observations on y, X2 represents missing 
observations on the X variables, the random errors u; are independently 
distributed N(0, 07/7,), y; and u; are T; x 1 vectors and X; are T; x K 
matrices (2 = 1, 2, 3). 

Derive and compare the properties of the following estimators of B: 
(a) using OLS with only the complete observations for y and X; 
(b) using OLS with X, replaced by a zero matrix and the observations 
corresponding to y3 and X3 dropped; 
(c) using OLS on the complete model with both y3; and X, replaced by 
zeros; and 
(d) treating y3 and X, as unknown parameters and using the maximum 
likelihood estimators of y3, X, and B. 


Question 4.14 
For the linear model 
Vie = Oye + xB + wit Gea eG rT ieee, 2) 
(4.14.1) 


where B is a & x 1 parameter vector, xj is non-stochastic, E(uz) = o* and 


é 
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E(ujuj,) = 0 for i =j, t =s, there are observations for T time periods 
across N cross-sectional units. Suppose a; = 0; + pW, and@, = y, = 0. 
(a) Derive an estimator of the coefficients 0;, W, and B. 
(b) Obtain test statistics for the hypotheses 

(i) Ho: &y = 0,B #0 against Hy : a, #0, B#O 

(ii) H,: 0; = 0, W,, 8 #0 against Hy, . 


(ili) Hp against H, . . 
(c) For the model: 
vi = 0; + By + Boxe + Uy ll by prarter nde ae pee Fee) 


the following observations are available: » | 


cross-section units: 


Ne Vex gen 

time periods: 1 ag 18, LZ <5 Loe 2 
2 eas | 15 3'5 3 Ra: 

3 Sea 16 6 12°75" 

4 O52 160 7 14 4 


(i) Estimate (4.14.2) and 
(ii) test the hypothesis 


Ho: 64 = 6, — 0, Bi, Bo #x 0 against XE 05% 63, B,, By #x (0). 


Question 4.15 


Consider the following error (or variance) components model 


Vit = xB + eit (= SINS N TOE ESI eee) (4.15.1) 
where B is ak x 1 parameter vector, k < NT, x is non-stochastic, 
Cy Uj se Ut ar Wit (4.15.2) 


with the u; independent N(0, 02), the v; independent N(0, 02) and the wy 
independent N(0, 03, ) variables. It is also assumed that u;, v, and wy are 
mutually independent. 

(a) Interpret the model, comparing it with the covariance model in 
Question (4.14). 

(b) Explain how you would estimate the model when 02/02, and o? /o2, 
are (i) known and (ii) unknown. 
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Question 4.16 
Consider the following model 
Vi = At xpBt+ uz +v, + wy (teSsdeaeeVi sth A eT) 
(4.16.1) 


where E(w) = 0, E(w) = 02,, E(wywj,) = 0 fori #j ort #s,Bisak x 1 
parameter vector, x; is non-stochastic, k < NT and Lyx, = 0. 
(a) Interpret (4.16.1) when uw; and v; 

(i) are constants, 

(ii) are random variables with E(u;) 
tj, E(v;) = 0, E(v?) = 02, E(v,v, = 
independently distributed. 

(b) You may assume the following: 


u?) = 02, E(u;u;) = 0 for 
#s) and u;, vz and wy, are 


= 0, E(u 
0 for (¢ 


' 


1 lim 
N,T7> © 


a. (02, Int ar o2A F os By = ra (yr = VeA — 2B + 73] nr )(4.16.2) 


where A =Iy ® Jp, B=Jy @ Iz, Jp is aP x P matrix whose 
elements are all unity 


is finite non-singular, 


Ou Op 


— ae ee eae An 
MSN O4 Gia” i YO Tae ey 


= f+ ——_—_—_—_—_—_——_— 
3 12 o2, + No2 + To2 


eine 1 
3. Inp —F(F'F)F' = [tae 


FA 5 Hite ane (4.16.3) 


where F = (Iy @ ip ?in ® Iz) and tp denotes a P x 1 vector of ones, 
4. 2,02 and o? are known. 


Prove that 
(i) the difference between the covariance matrices of the estimators of 
6 obtained from the covariance and the error components models is 
positive semi-definite, 
(ii) these two estimators are asymptotically equivalent if N > c and 
T > © in such a way that N = aT, where a is a positive constant. 


Question 4.17 


The behaviour of the 7th micro-unit at time ¢ is assumed to be explained 


by the model 
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Yin = XinBy + ze + Un (tu Shdemanie eg ut He Seal 
(4.17.1) 


where B; is ak x 1 parameter vector and y is an/ x 1 parameter vector, 
xj and zz are non-stochastic, E(u?) = 0? and E(ujuj,) = 0 for 1 #7 or 
t#s. 


The corresponding macro-model is assumed to be ny 

V; = KP + 27 + (4.17.2) 
where ¥,; = DN, vie, Xe = Dy xz, B= UML, zz and Band y are coefficient 
vectors. 
(a) Under what conditions does (4.17.2) aus the correct 
macro-model for (4.17.1)? 
(b) Assuming that these conditions for apaceuation hold, how would you 
estimate B and y in (4.17.2)? 
(c) Obtain an expression for the bias of these estimators when these 
conditions do not hold. 
(d) Given data on the micro-variables yj, x; and z;, derive a test for the 
perfect aggregation of (4.17.1). 


Question 4.18 


The following regression was run across countries in order to test the 
hypothesis that variations in output per man (Q/ZL) in the iron and steel 
industries of different countries are explained by differences in the money 
wage rate (W): 


nS = 0.363 + 0.811 InW + u, R* = 0.936 (4.18.1) 


(0.051) 


where uw is the residual error and the figure in parentheses is the standard 
error. 

(a) Test this hypothesis. 

(b) It has been suggested that the equation above is mis-specified because it 
excludes the effects of variations in efficiency between countries which 
raise output per man and are positively correlated with money wages. 

How might this affect your conclusions? 


2. SUPPLEMENTARY QUESTIONS 


Question 4.19 


Consider the linear model \ 


Ye = X%44B, + XB. + uy (G2 15 el O) 
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where E(u,) = 0, E(u?) = 1, E(u,u,) = 0 for t #s and the moment matrix 
of the variables is 


x4 10 1 2 
x4 1 20 5 
y an 2510) 


Suppose tht the following extraneous information is available 


re! Sag 

feo aac 
1 Owed Bo 

where e, and e, are random variables with zero means, E(e?) = 1, E(e3) = 

2 and E(e,e,) = 1. 

(a) Compute the unrestricted OLS estimators of B, and B, and their 

covariance matrix. 


(b) Using the extraneous information, compute more efficient estimators 
of 8, and B, and verify numerically that they are more efficient. 


ey 


e2 


Question 4.20 
In the model 
y = XB+u (4.20.1) 


X is anon-random T x k matrix of rank k and wu is a normally distributed 
vector of random disturbances with mean vector zero and covariance 
matrix 07 J. 

(a) If D is a known k x / matrix of rank / << and 


K = {6;B8 = Dy forsome y} 
obtain a test statistic for the null hypothesis 
Ho : Bek 
against the alternative 
HH, B¢K 
(b) X is now partitioned into X = [X,:X] where X, is Tx q and X2 is 
T x (k —q) and the following new variables are defined 
See SE £2 = Xotr 


where ¢, and ¢, are known vectors (g,; and g, may be regarded as 
observations on aggregates of the variables in X, and X,, respectively). 
Using these new variables, the following new model is constructed 


170 EXERCISES IN ECONOMETRICS 
Y = £101 + $20 tu (4.20.2) 


where a, and a, are scalar parameters. 
Show how to apply the theory of Part (a) to test whether the new 
model (4.20.2) is appropriate. 


(Adapted from University of Essex MA examinations 1976) 
N\ 


Question 4.21 


Explain how a set of dummy variables can be used to test whether there 

is any difference in the constant terms of a multiple regression equation in 
different sub-populations. Show how the equations can be estimated by 
using deviations from arithmetic means, and consider how appropriate 
significance tests can be computed when this is done. Can similar 
techniques be employed when the errors are serially correlated with a 
known covariance matrix? 


(University of London M Sc (Econ) examinations, 1976.) 


Question 4.22 

Consider the two-equation model 
Vi Pe= aD POZA 0 ho ee et Cz 1, ee e0) 
Von = Ogi Usts tO gha pt tay 


where E(u; +) = 0 (¢ = 1, 2) and E(u; -uj;,.) = oj. The above equations can 
be written in system form as 


2 O Xj) 1 Bs: U2 
where y, is a (20 x 1) vector, X, a (20 x 3) matrix, uw; a (20 x 1) vector 
etc or, more compactly as 
y = XB+u 


where E(uu’) = ®. Application of the Aitken estimator to the two 
equations gives 


Vite Oa @ 2) let 0, 088% st 01 39x5 
ant = —-1.3 + 0.058x3 4 ale 0.064x4 +. 


It is desired to test the hypotheses b, = bs and b3 = be 
(a) Write the restrictions on the system equations in the form RB = r. 
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T2000 = 900-2 


(b) If [R(X’BX) R71 = | 
I BIERA arf 960.2 670.0 


compute the 


Lagrangean multipliers for each restriction and their covariance matrix. 
Hence, provide a test on each of the restrictions separately and a joint 
test for the validity of both. 


Question 4.23 
(a) In the model 
y = XB+u 


y is Tx 1, X is a fixed T x k matrix of rank k and u is N(0; 07/). Let F 
denote the F-statistic used to test Hp: B = 0 against H,: 6 #0 and let F(/) 
denote the statistic used to test Hp (/): /'B = 0 against H, (/): /'B #0. 
Prove that 


max F(]) = kF 


leN 


where N = {1:/'(Z'Z)“11 = 1} is a normalisation set, 
(b) For the special case of the model in Part (a), 


Vit Bitar © bake te (eg Dy 1) 
the following data on sums of squares and cross-products were obtained 
for a sample size T = 27: 
y xX, X2 
y 1106 34 25 
x2 25 1 Me 


(i) Test the hypothesis 


Ho: (") = 0 against H;,: (") # 0 


at the 5% significance level. 
(ii) Determine by a simultaneous comparisons procedure whether it is 
possible to say at a significance level of at most 5% that one or both 
the least squares estimates 8, and 6, of 6, and By are significantly different 
from zero. 


(University of Essex MA examinations, 1977.) 
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Question 4.24 


B, denotes the number of bankruptcies occurring during period ¢ amongst 
a (large) number of retailers in existence at the start of period ¢. The 
average rate of interest on hire purchase credit during period t is denoted 
by 7;. It is known that, conditional on 7r;, By has a Poisson distribution 
with parameter Ay = a+ Bry. 

(a) Derive the regression function of B,; on r;. 
(b) Explain why an ordinary least squares regression applied to the 
equation 


B, = (eM ign 


where u; is a random disturbance, yields inefficient estimators of a and £. 
(c) Suppose T independent observations‘are available on B, and r;. Derive 
the likelihood function for a and 6 and find the equations whose solution 
yields the maximum likelihood estimators of a and B. 


~ 


(Adapted from University of Birmingham B Soc.Sc. examinations, 1977.) 


Question 4.25 
The model 
Ye = Bo + ByXt + B.Dy + B3D3¢ + BaDa + ut 


was estimated by ordinary least squares (OLS) from quarterly time series 
data, when 


Dy = 1 in quarter 2, 0 otherwise. 
D3, = 1 in quarter 3, 0 otherwise. 
Dy = 1 in quarter 3, 0 otherwise. 


Show that 6, , the OLS estimate of 8, , is equivalent to the coefficient of 
x? in the regression yf = a+ Bx + w; where y/ is the residual in the 
regression of y,; on D»,, D3,, D4, and a constant and x/ is the residual in 
the regression of x; on Dy, D3, D4 and a constant. Give an interpretation 
and suggest a practical use for this result. 


(University of London M Sc. (Econ) examinations, 1977.) 


Question 4.26 


It is thought that the relationship between y and a single explanatory 
variable x can be represented by three linear segments, intersecting in turn 
atx =x, and x = xg. Describe the necessary computations and construct 
appropriate tests to carry out the following analyses. * 

(a) Test the restriction that the three linear segments join at x = x4 and 

x =Xg, against the unrestricted alternative of three disjoint linear segments. 
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(b) Test the null hypothesis ‘the relation between y and x is linear’ against 
the alternative ‘the relation is piecewise linear, segments intersecting at 

x =x, and x = xz in turn’. State carefully any assumptions that you 
make. 


(University of London, B Sc. (Econ) examinations, 1975.) 


Question 4.27 


After the linear model 


Ye = x B+, 
where E(u;) = 0,E(u?) = 0? and E(u,u,) = 0 for t #s has been estimated 
from a sample of T observations, a further n observations become 
available (n <k), and it is desired to test whether these additional 
observations satisfy the same linear model as the original sample. 
(a) Develop an appropriate F-statistic for this purpose, stating carefully 
any assumptions that you make. 
(b) In the case n = 1, develop an appropriate t-statistic for testing the 
hypothesis that the single new observation obeys the same linear relation 
as the original sample. 
(c) If n = 1, show that the tests of parts (a) and (b) are equivalent. (You 
may use the relation 


at. DVD 
(yr+1 — X74+18) 
|W g xT 41 (X'X) Tx p41 
where Q7 and Q,r4, are the residual sums of squares in regressions based 


on T and T + 1 observations respectively, yr4; and x74; represent 
the new observation and X' = (x1, %,...,xr)-) 


Qrai = Qr + 


(University of London B Sc. (Econ) examinations, 1975.) 


Question 4.28 


(a) Quarterly observations in the period 1953(3)—1965(4) are available 
on the following variables: 


I, = gross investment in quarter t¢ 

Y,; = gross domestic product in quarter ¢ 

C, = gross consumption expenditure in quarter ¢ 
r, = long term interest rate in quarter ¢ 


All variables are seasonally adjusted and in current prices. Suppose the 
relationship (4.28.1) were fitted using ordinary least squares (OLS) with 
the results shown in Table 1. 
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a Bo 4 By Yer B. (C; = Cpe}) oe B3rt-2 a Balt -1 ot Ut (4.28.1) 


Table 1 
Coefficient Bo By Bo B3 Ba 
Estimate 6.166 0.0604 0°502) "3.300 0.663 
Standard e 
Error 0.018 0.249 ie Be) 0.097 


Roa 0.947, sum of squared residuals = 106.32 
d = 1.95 (Durbin—Watson statistic). 

No. of observations = 47 

(from Moroney and Mason 1972). 


Re 


(i) Test Hp: p = 0 against H,: p > 0 where uz = puy_ + ey. 

(ii) Test Ho: B3 = O against H,: Bz <0. 
(b) Suppose there were some doubt concerning the effectiveness of the 
deseasonalisation techniques applied to the variables in (a). To ascertain 
whether any seasonal effect remained one might consider estimating the 
coefficients of the following equation: 


== [ear pay ae B2(C; =a =i) + B3rz-2 + Bale-1 + W1Dit 
+ 2D 21 + 73D 3+ + Uy (4.28.2) 


where Dj = 1 if observation number t relates to quarter 7. 
= otherwise. 
Suppose the results of OLS estimation were as shown in Table 2. 


Table 2 
Coefficient Bo B, Bo B3 Ba sil Y2 3 
Estimate 6.050 0.058 0.510 —3.100 0.652 1.170 0.820 —0.010 
Standard 
error 0.020 0.230 1.810 0.010 0.600 0.500 0.010 
R? = 0.983, sum of squared residuals = 87.24 
d= 1.98 


No. of observations = 47 


(i) Test Ho: y = 0 against H,: y # 0 where y' = (7, ¥2, 13). 
(ii) Why are only three seasonal dummies included when there are 
four quarters whose separate effects have to be explained? 


(Adapted from University of Birmingham M Soc.Sc. examinations, 1974.) 


Question 4.29 . 


Consider the linear model 
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Ve ee Ae Wy Fae aera (8 

where 
Up = PUz-y + ey 


the e, are independent N(0, 0”). Also, |p|<1, uo = 0 and (x, ...x,) are 
fixed in repeated samples. Carefully explain the principles of using Monte 
Carlo simulation methods to study the ae Sample distribution of the 
maximum likelihood estimator (6, 6, 67) of (8, p, 07). Include a brief 
description of how to investigate the following hypotheses: 

(a) E(B) = 

(b) E[6 — E(6)]* = (1 — p?)/T 

(c) the size of the test: reject Ho: B = 0 if |B|A/var(B) > 2 is 0.05. 

( 

( 


d) 6 is more efficient than the ordinary least squares estimator of B. 


University of London B Sc. (Econ) examinations, 1977.) 


Question 4.30 


From a sample of observations on Y; and X; an investigator calculates the 
following moments in terms of deviations about means, the entries in the 

table (m;;) being based a 102 sums of squares or cross products so that, 

for age m3. = 224 aXe ios h) (X, — X) = 74 where Y = 

toz Li2, Y;-, and X = rhe 2i23, x 


Y, Ae Yes 


Y, 150 

X, 100 100 

Youna 123) 74) e138 
Mediate 8? HsiS ate 92H 290 


(a) Estimate win Y, = aX, + U,(t = 2,...T) and test Hp: a = 0. Is it 
useful to test Hp: a= 1? 4 s 
(b) Letting Up Vomit Abi , T) the estimate of P in U,; = PU;-, 
+ V, is P= 0.955 and so you decide. to reformulate the equation as 


GY. Y=) = a(X, — Xen1) Ey (t = Dyaringd ). 


Re-estimate a and now test Hy: a= 1. 
(c) Suggest an economic interpretation for Y; and X; where such null 
hypotheses occur. 


(University of London, MSc. (Econ) examinations, 1977.) 
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Question 4.31 


(a) What are the essential features of a good numerical minimisation 
algorithm? Explain in detail why and under what conditions the 
Newton—Raphson algorithm is superior to the steepest descent algorithm. 
(b) Describe and justify your choice of a computationally efficient 
algorithm to obtain statistical estimates of p and the k parameters in B, 
which are consistent and asymptotically efficient, for the linear model 


y = XBtu 


when the errors u are generated by a first order autoregressive process 
with serial correlation coefficient p. (Assume that the T x k matrix X 
consists of T observations on k “‘fixed”’ regressors.) 


(University of London M Sc. (Econ) examinations, 190 


Question 4.32 


It is required to test the multiple regression estimates of the equation 


Vht ao Bixine + Yn + 5: + tne 
from a sample of panel data for a set of households,h = 1,...,H, 
 Biaae Wer onaray be 


Assuming first that y, and 6; are non-stochastic, show how the 
equation can be estimated using covariance-transformed variables, and 
show how it is possible using these estimates to test the following 
alternative hypotheses: 

(i) that all y, = 0, 
(ii) that all y, and 6, = 0, 

(iii) that B; depends on h, but that otherwise the Aner: equation is 

valid. 

Show how an F-ratio test can also be derived by taking (i) as null 
hypothesis and taking (iii) as alternative hypothesis. Discuss the 
relationship between this F-ratio test and the preceding tests. 

Comment on the case where the y;, and 6; are assumed to be stochastic. 


(University of London M Sc. (Econ) examinations, 1977) 


Question 4.33 


An investigator tentatively hypothesises the following relationship for the 
aggregate demand for money M, at time t: 


M, = B,Y; + Bolt + ut (4.33.1) 


\ 


where Y; is money national income and J; is the rate of interest. He 
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estimates (4.33.1) by ordinary least squares from a time-series of 50 
observations for which M/Y = 4 and the correlations between M, and t 
and Y; and ¢ are 0.995 and 0.980 respectively and obtains 
M, = 2Y,—4i, R? = 0.990 DW = 0.5 (4.33.2) 
(0.05) (30) 
(standard errors in parentheses). Since the coefficient of J; is “insignificant” 


he drops that variable, but because of the low DW (Durbin—Watson) 
statistic, he decides to include M;-,. Re-estimation by OLS produces 


M, = 1.0Y,+0.7M,-, R? = 0.996 DW = 2.1 (4.33.3) 
(0.04) (0.10) 


which the investigator accepts as a “‘satisfactory” relationship. Critically 
appraise this entire ‘“‘model building” exercise. Justify all your criticisms 
in the light of both econometric theory and the empirical evidence 
reported. 


(University of London B Sc, (Econ) examinations, 1977.) 


3. SOLUTIONS 
Solution 4.1 


Part (a) R? is defined as the ratio of the explained to the actual variance 
of y. Writing equation (4.1.1) more compactly as 

y = XBt+u (4.1.4) 
where X = (x1,x),6' = (6; , Bz), the explained variance of y is T™'b'X'Xb 
where b is the OLS estimator of 6. Hence, 
_ b'X'Xb 
yy 
Substituting in (4.1.5) for b = (X'X)~! X’y and simplifying we have 
y'X(X'X)*X'y _ y'y ~y'[— X(X'X) XT y 

y'y a y'y 


52 


i Saceay, eibibaseane 

Myy 
where s? = T7!y'[I — X(X'X) 1 X’] y is an estimator of 0”. We note that 
s? is not an unbiased estimator of o?. 


R?2 


(4.1.5) 


R? = 


Part (b) We know that an estimate of the covariance matrix of 6 is given 
by 


178 EXERCISES IN ECONOMETRICS 


/ = 
SPX Kiet = of ie m2 
T mM, ™22 


< ee M22 m2 
T (mmr. — M2) —mM 42 ™m14 
Therefore, 
s2™m;:; 
var (b;) = u (fori # 7) 
T(m 122 — mj2) 
2 
etypees Shun an q (4.1.6) 
Tm;;(1 77) \ 
where 
r? = m2,/(my,m22). From (4.1.2) 
Siume(lis RA) gigs o> (4.1.7) 


Hence substituting (4.1.7) and (4.1.6) we obtain the required result 
T*(1— R*) myy 
(lr?) mis 


var (b;) = 


Part (c) As the degree of collinearity between x, and x2 increases r? 


increases, approaching unity for perfect collinearity. But as r? > 1, from 
equation (4.1.3) we see that var(b;) > ce. Thus multi-collinearity will 
result in large values of the variance of the OLS estimates, that is, in very 
imprecise estimates. 


Solution 4.2 


Part (a) From the definition of eigenvalues and eigenvectors we have for 
tA oak 


XXQ; = XQ, 
- X'XQ = QA 
where 
Q = (Q)..- Q), oy, 
and 
C O00 aah, \ 


Q; is known as the 7th eigenvector and A; the corresponding eigenvalue. In 
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this case A; > 0 for all ¢ (Theil, 1971, pp. 26—28) It follows that 
X'X = QAQ' 


and 


(X'X)? = QA*Q 


if A; > 0 for all 7. Consequently, we can write the covariance matrix of b 
as 


k 
G(X'X)! = oF QA%Q’ =o? 2;10,0' (4.2.2) 
i=1 
which is our required result: 


Part (b) Perfect collinearity between two or more explanatory variables 
implies that at least one A; = 0. More precisely, if r explanatory variables 
are linearly independent and hence the remaining k —r can be formed as 
linear combinations of these r, then the rank of X and of X’X will be r and 
there will be k —r zero eigenvalues. Near collinearity means that one or 
more eigenvalues will be small, nearly zero, and so their reciprocals will be 
very large. The contribution of these terms in equation (4.2.2) will 
dominate, leading to a large covariance matrix of b. This is one observable 
effect of the presence of multicollinearity. Another is unstable coefficient 
values due to rounding error problems in inverting X’X. When there is 
multicollinearity, the pivotal condensation methods of inverting a matrix 
will be using pivotal elements which are just rounding errors. To see this, 
let k = 2 and let xy, = ax,, + e, be the regression of x. on x,, then 


Mm, ™y42 


X'X = r| 


my amy 


2 
M2, ™22 amy, a My + Me 


i a | 
m 
etl iar ca 3 


where m;; = T™! D7. xi¢%j2 for 1,7 = 1, 2, mee =T ' Dia,e7 and @ = 
Mee |Mm41,. After pivotal reduction of X’X using the top left hand element 
of X'X as the pivot we obtain the matrix 


m 
11 Ean 


If we now pivot on the element 6 we obtain the inverse of X'X. Suppose, 
however, that x, and x, are nearly collinear then, apart from rounding 
errors, 6 will be close to zero and hence either we shall not be able to 
perform this second pivotal reduction or, if we can, the elements of 


é 
of ¢ 
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(X’X)7! will be large, as they will reflect the attempt to divide by a near 
zero number. 


Part (c) There are & principal components of X obtained from C = XQ; 
the ith principal component is C; = XQ; and C = (C,... C; ). Recalling 
that QQ’ = I we can insert QQ’ in equation (4.2.1) to obtain 


y = XQQ'B+u 2 
= Cdt+u (4.2.3) 


where 6 = Q’B is a length & column vector of coefficients of the k 
principal components. Equation (4.2.3) can also be written as 


\ 


k 
Ve na Cioue (4.2.4) 
i=1 


Equation (4.2.4) explains y by a linear model in which the principal 
components C;(¢ = 1,..., 4) are the explanatory variables. 

Consider the explained sum of squares in the regression of y on ¢,, 
Cy,..., C,;it is obtained from the decomposition 


yy = d'CCd t+ wu 
where d is the OLS estimator of 6, u’u is the residual sum of squares and 
d'C'Cd is the explained sum of squares. But 

C'C = QX'XQ =A 
hence 


k 
d'C'Cd.= d' Ad ="¥ ),d? 


~ 
Il 
_ 


From this it can be seen that the principal components corresponding to 
the smallest eigenvalues have the least explanatory power; those 
corresponding to zero eigenvalues have no contribution to make to 
explaining variations in y. 

This suggests that a possible test for multicollinearity is to use the 
hypothesis that a sub-set of the 6; are zero against the alternative that 
they are non-zero. Conventional F or ¢ tests result. We may note that if 
there is perfect collinearity and henceA; =0 (¢=r+1,...,) then 
the corresponding 6; = Q;6 = 0. We should also note that even though we 
may accept the presence of multicollinearity using this test, it may still be 
possible to invert X’X and hence obtain OLS estimates of £. 


Solution 4.3 


Part (a) The OLS estimator of 6, obtained from (4.3.1) is 
by = (XX) TX y1 
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= By + (X1X1)1 X41 (X28, + u) 

As E(u) = 0 and X, and X, are fixed we can show that 
E(b,) = By + (X1X1) 7 X1X2B, 

Thus the bias expression required is 


E(b,) —By = (X1X,)7" XX2B, (4.3.4) | 


Part (b) In general it is difficult to say much about the magnitude and 
direction of the bias. Let us, therefore, consider the special case where 
X, and X}. It is clear from (4.3.5) that the bias is greater in absolute size 
variables are measured as deviations about their means. The bias can now 
be written as 


bias = m42B2/m4, 
7B (m2/m4,)"? (550) 


where my = T7! Dfa, xix, for 7,7 = 1, 2 and r is the correlation between 
X, and X,. It is clear from (4.3.5) that the bias is greater in absolute size 
the greater is r, B, and m2/m,,. The sign of the bias depends on the sign 
of rB,. This result suggests the following generalisation: the greater the 
correlation between the excluded and the included variables and the 
greater the variance of the excluded relative to the included variables, the 
greater the bias. 


Part (c) Let bj be the OLS estimator of 8, obtained from equation 
(4.3.3), then 


be (xe x*) Xs 
Substituting the correct expression for y from (4.3.2) we obtain 

bt = (X*X*)* X*(X,B, + X28. + u) 

By construction 

XK” SUX a Pa Xe a1 PL) XM 
where P, = X,(X,X,)71X1. Consequently, 

BORG = Xi —P2)X{ and « X" Xs = X{(P-P,) X, = 0 
Therefore, 

bY = [X\(I— Pe) Xy}! [X40 — Pa) X1By + X40 — Py) ul] 

= 6B, + [Xi(1—P2,) Xi] 1 Xi — Pa) u 


Thus E(b{) = 8, and bf is an unbiased estimator of B; . In other words, we 
can obtain an unbiased estimator of the coefficients of a sub-set of 
variables by regressing y on the residuals from a regression of this sub-set 
on the other regressor variables. 
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Solution 4.4 


Part (a) From equation (4.4.1) we can write 
y—X,b, = Xo6,+H (4.4.2) 
where w@ = u — X,(b, —6,), E(u) = Oand 
E(w’) = Elu—X,(b, — 81) [u — X1(61 — Bi)’ 
= E(uu') + X,E(b; — By) (61 Bi) X41 
oT + XV X; 


The vector of regression coefficients of y —X,b, on X 1s 


II 


by = (X2X2)" Xo(y—X1b1) (4.4.3) 
Substituting (4.4.2) into (4.4.3) yields 
by = By t+ (X2X2) 7! XQu (4.4.4) 


Since X, is non-stochastic and E(u) = 0, E(b,) = B, and the covariance 
matrix of b, is 

E(b2 — Ba) (b2 — Ba)’ = (X2X2)"! XE (di) X2(X2X2)! 

07 (X2Xo) 1 + (X2X2) 1 X2X1 Vy X41 X2(X2X2) | 


Ale: 


Part (b) The OLS estimator of 6, obtained form (4.4.1) is given in 
Question (2.7) as 

B, = (X2Qi1X2)' X2Qry 
where Q, = I — X,(X,X,) 1X}. The covariance matrix of B, is 

V2 = 07(X2Q,X2)"' = 07 (X2X2)! 

eG? (X5X) XX Oo NT) ee (4.4.5) 

where Q, = I — X3(X)X )7!X>. We can compare V, with V, by 
considering 

Vo — Va = (X2X)*X2Xy [07(X1Q2X1)* — Vi] X1X2(X2X2) 


The difference V, — V, is positive semi-definite and hence b, is at least as 
efficient as B, if o?(X,Q,X,)~! — V, is positive semi-definite. In other 
words, the more precise the extraneous estimator, the smaller is V, and 
hence the more efficient is b, relative to Bj. There is, of course, no 
guarantee that V, — V2 is positive semi-definite. If, due to an imprecise 
estimator b,, 07(X,Q)X,) 1 — V, is negative semi-definite, then B, is 
more efficient than b,. ; . 


Part (c) An alternative method of combining the extraneous information 
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with the sample information (4.4.1) is as follows. Let 

b, = Bb, +e (4.4.6) 
then from the given information on b,, e, is an error term with E(e,) = 0, 


E(e,e,) = V, and E(e,u) = 0. Equations (4.4.1) and (4.4.6) can be 
combined to form the single equation 


Fees wataca a 


which can be written more compactly as 


SO he, Od ew (4.4.7) 
where 
x |X , ' , , / , 
sage i (y’, Bi), x = 1%) Fs SER es ae = CRN 
TiVO 
E(u*) = 0 
and 
o7*h | =0 
Been) )= Sass 
01 V, 


An efficient estimator 6* of 6 is obtained by applying generalised least 
squares to (4.4.7). From Question 2.12 we have, therefore, 


Bt = (XVOOX*) AXON y* (4.4.8) 


iE | ee ed i | Zaire 
Ee cial ee = Oe S 
xX, 10 psi) lr wo 
ne) os |e | 
agen Wty og ee 


o7?X4X, + V7! | RE Sal peat + Vz1by 
Gn ER, Neo AX, 


| (4.4.9) 


II 
i pa ee 


0° X zy 
Substituting (4.4.7) into (4.4.8) 
BX = B+ (X*Q7X*) 7 X*N1y* 


hence £(6*) = B and the covariance matrix of 6* is 
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| 
on X'X, Vy) 1 a AXX 
(Xx Q-1X*) 7! = SS ee (4.4.10) 
a 7X,X, | oO XX 


Using the result in question (2.1) on the inverse of a partitioned matrix, 
the covariance matrix of B}, the estimate of B,, can be shown to be the 
bottom right-hand element of (4.4.10) which is £ 


Vt = 0?(X2X_) 1 + (X2X2) 1 X2X HX X2(X2X2)', 
where 

H = 0? [(X1X, + 0? Vz") —X4X2(X2X2) 7 X2X1)! 
07 [X1Q.X, +o? Vy]? 
Comparing V7} with V, we obtain 


Vo — Vz = (X2X2)1X2X1(Vy — A) X1X2(X2X2)™ 


\ 


which is positive semi-definite if V,; —H is positive semi-definite. 

Using the result that B-! — A! is positive semi-definite if A — B is 
positive semi-definite and B is positive definite (Goldberger, 1964, p. 38) 
we can deduce that, since Vj! is positive definite, V;— H is positive 
semi-definite if H~! — Vj! is positive semi-definite. But 


H*—V," = (0 7X,Q,X, + Vi’) — Va O=1074X 0,5; 


which is positive semi-definite. Hence V; — H and V, — Vj are both 
positive semi-definite, implying that 63 is at least as efficient as the 
conditional regression estimator b). 

Comparing V7} with V, we consider 

ae pats ’ -ly! - ’ - 

Va — Vz = (X2X2) 1 X2Xy [07(X1Q2X1)? — A X41 X_ (XX)! 
which is positive semi-definite if o?(X,Q,X,)~! — H is positive 
semi-definite, or using the above result, if H~' — 0~?X,Q,X;, is positive 
semi-definite. But 


Het =0 7X QoX 1s = (02° XyOoX eV) On kg Oakey 


which is positive semi-definite. Hence V, — V} is positive semi-definite, 
implying that 83 is also at least as efficient as the unrestricted OLS 
estimator B,. 

The estimator B3 is due to Durbin (1953). See also Goldberger (1964, 
pp 258—261). 


Part (d) Suppose that the X, variables are collinear, then the moment 
matrix X;X, is singular. As a result, the unrestricted OLS estimators of 
B, and B, cannot be obtained. However, given an extraneous estimator of 
B,, we can use the conditional regression estimator equation (4.4.3) or 
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the more efficient estimator equation (4.4.9) due to Durbin. Durbin’s 
method can easily be extended to cover the case where X is collinear 
and extraneous information is available on 8, and/or B. 
Another ad hoc way of overcoming multicollinearity is to use ridge 
regression (see Hoerl and Kennard, 1970). This consists of adding terms 
to the leading diagonal of the moment matrix of the explanatory variables, 
i.e. to the X'X matrix, to achieve a non-singular matrix. Suppose, however, 
that we set b,, the extraneous estimator of 8,, equal to zero and restrict 
V, to be a diagonal matrix, then (4.4.9) can be written 
’ 21-1 | ’ =I ’ 
eee a se) a a (4.4.11) 
X2X; | X2X2 X2y 


But (4.4.11) has the same form as the ridge regression estimator. We may 
therefore, interpret ridge regression as a special case of the estimator 6*. 
From an econometric point of view, the artificial nature of this 
‘extraneous’ information does not commend the ridge regression 
technique. 


Solution 4.5 


Part (a) Re-write equation (4.5.2) more compactly as 
Vo AB Te (4.5.3) 


where X = (X,: X,) and B’ = (64, 62). The OLS estimator 6 of £ is clearly 
unbiased since X is fixed and, because 6, = 0, v = u and hence E(v) = 0. 
Thus 


Fl -(0 
bz 0 
where b’ = (bi, by) and hence E(b,) = B, as required. 
The efficiency of this estimator of 6; is measured by comparing 


var(b,) with var(6,), where B, is the OLS estimator of 8, obtained 
from (4.5.1); 8; is also unbiased. Now the covariance matrix of b is 


ae A AG 
o?(X'X)7 = g? : 
XX, X2X2 
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5 ee i (Xi X1) 1X1 Xa(XQrNa) 1 XN (MK) 
= a; aided Uae oo ae eae be Lea a ee Se ere 
—(X2Q1X2) 1 X2X4(K1X1) 7 | 
(4.5.4) 


ri aE) 


~ 


where Q, = 1 — X,(X1X,)7! Xj. (See Question 2.1 for further details of 
this result). Thus V,, the variance of 6,, is the top left-hand element 
of the partitioned matrix on the right hand side of (4.5.4). Now the 
variance. of 6, is Vy = 07 (X, X,) 4, hence 


Vy = Vy + (X1X1) 1X1 X2(X2QrX2) 1 X2X 1 (X11) (4.5.5) 
\ 

The last term of (4.5.5) is positive semi-definite since (X} OoovE. 
is positive semi-definite. Consequently, 6, is at least as efficient an 
estimator of 6, as bj. 
Part (b) By definition 

v= y—Xb 

= [-X(X'X)7X']y 


We may substitute for y from (4.5.1) since it is the correct expression for 
y and, noting that X, = X(X'X)7! X'X,, we obtain 


v= [i eX XOX VAX 
and hence 


uo =u [f —~ACX X)o XJ u, 
Now 
E(o'v) = Ef{u' [I —X(X'X) IX] u}. 


But v’0 is a scalar and so 
E(o'v) = (sae Pa CCE Se u}) 
SE (tet Wire XA a Arai) 


= o? tr [I —X(X'X) 1X] 

=o {irl trl x(x Xp exe 
= 02 {T —tr[(X'X) 1 X'X]} 
=the Dame 


Thus 0'0/(T — 1 —k) is unbiased for o?. 
These results suggest that we should consider eliminating variables with 
zero coefficients and re-estimating our model to get more efficient 
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estimates of the non-zero coefficients. Prima facie it would seem sensible 
to eliminate variables whose coefficients are not significantly different 
from zero. When eliminating these variables it is important to exercise care 
in the choice of the significance level of the test, for the smaller the 
significance level, the greater the probability of eliminating a variable 
whose true coefficient is non-zero. We must be especially careful because 
multi-collinearity, for example, can lead us to remove variables whose true 
coefficients are non-zero; and we would then have biased estimates of the 
remaining coefficients. 


Solution 4.6 


Part (a) Consider the quarterly model 


4 
Vie Qbx,p =P > S25; + uy (t = We 2 atl.) (4.6.2) 
t=1 


where S;, = 1 in the zth quarter and 0 otherwise. Since the seasonal 
variation should contribute nothing to the constant term over time 


So (4.6.3) 
= 


In matrix form equation (4.6.1) becomes 


V4 Dip meee AL OF 02-10 0% U4 
V2 OE ie a, O B U2 
V3 1 gt FL ORACO De. O 64 
Vato PL eg OP Or F0ee 2) b.4=]. (4.6.4) 
Vs heuer, AG 53 
: ; iy 
Vr Pepe se-Ue Bet up 
or 
y = XOt+u (4.6.5) 


Equation (4.6.5) cannot be estimated as it stands since X is not of full 
column rank: the first and the last four columns of X are linearly 
dependent. We must, therefore, impose the constraint given by equation 
(4.6.3). This can be done in several ways, either (i) by dropping one of 
the linearly dependent columns, or (ii) by substituting for one of the 6; 
using (4.6.3), or (iii) by using restricted least squares with the constraint 
(see Question 2.5): 


é 
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Gin (OraeOstuliw Lava 


The disadvantage of (iii) is that it involves a more complicated method of 
estimation than OLS; (i) and (ii) enable OLS to be used but only (ii) 
preserves the original coefficients. 

Using (ii), therefore, we substitute in (4.6.2) 5, = — (6, + 6, + 63) to 
get 


SN 
3 
ye = atx Bt DY (Sit — Sat) 6; + ue. 
i=1 
Thus (4.6.4) becomes 

V1 ] X41 1 0 ‘0 Qa Uy 
V2 1 x2 0 if 0 B ur 
V3 1 x3 0 0 1 6, U3 
pe er | ig ey sae Se SPS ey by 8 (4.6.6) 
V5 1 Xs 1 0 0 6, Us 
. °° 53 
58 1 xT = =I =I up 


which we can estimate by OLS. 6, is estimated by 
d4 = —(d, + d, +d;) 


where d; is the OLS estimator of 6;(¢ = 1, 2, 3). 

Whilst method (i) does not preserve the original coefficients they can, 
nonetheless, be readily derived when this method is used. To show this, let 
us drop 64, say, then (4.6.2) becomes 


3 
ye — Oo +x,B+ ¥ SO Er uy (4.6.7) 
jet 
Equating terms with (4.6.2) we obtain 


aor = at 5, (S182) (4.6.8) 
and 
a*=atd, (4.6.9) 


We can derive @ and 6; from (4.6.8) and (4.6.9) as follows: from (4.6.8), 
3 

3a* +) 6% = 3a4+ 8; 
1 


and, using (4.6.3) and (4.6.9), this becomes \ 
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3 
3(a+ 4) +> 8% = 3a—8, 
1 


or, simplifying, 


3 
5, = —} 2. oF, (4.6.10) 
Hence, 
3 
a=a*—}y 6} (4.6.11) 
1 
and 
3 
5; = 67 —4 ¥ dF (¢ = 1, 2, 3) (4.6.12) 
1 


Given estimates of a* and 67, equations (4.6.10)—(4.6.12) enable us 
to obtain estimates of a and 6;. 


Part (b) Method (i) above has been used in equation (4.6.1). The 
coefficients of S;, in (4.6.1) are estimates of 67 (¢ = 1, 2, 3). We may 
obtain estimates d; of the seasonal coefficients 5, from (4.6.10) and 
(4.6.12). Thus d; = —0.61, d, = 5.51, dy = —3.87, dg = —1.04, and the 
estimate of a is seen to be 69.3. (Note: L{d; = —0.01 which differs from 
zero due to rounding errors; keeping more significant figures would have 
made 2d; closer to zero.) 

As the coefficient of the variable t in (4.6.1) is not significant even at a 
20% level of significance, these results indicate that there is no trend to 
Y,. There is only seasonal variation around the mean of Y; which is 
69.3. In quarter 1, Y; takes the expected value 68.7; in quarter 2, 74.8; 
in quarter 3, 65.4; in quarter 4, 68.3. 


Solution 4.7 
Consider the linear model 
7. 4 
a ys x 4:8; ae by S45; at Uy (t — 1 eee yg fis (4.7.1) 
i=1 j=1 


where sj, = 1 in the jth quarter and zero otherwise. Recall that 276; = 0 
(See Question 4.6). Equation (4.7.1) may be written in matrix notation as 


y = XB+S6+u (4.7.2) 


where 
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J (x4), 8 = (sy), B - (B,B., ee siees)'5 5’ = (5;, EO» ONT) 54). 


Let Q be a matrix which deseasonalises (4.7.2). In other words, Q is 
chosen so that QS = 0. A suitable matrix Q is 


OME SOUR 


If there are exactly m years of data (T = 4m) then S'S = mI, and hence 
~ 


1 6:0 Oi 0s Se ae em 
Ol 0 
yer 
Q as Was a ES a oS (4.7.3) 
OO 208 V1 ee eee! 
Now 
Qy = QXB + QS5 + Qu 
= QXB6 + Qu 
or, ~ 
Ye = XBt+u, (4.7.4) 


where y, = Qy, X, = QX and u, = Qu are deseasonalised variables. Thus 
(4.7.4) is the deseasonalised model corresponding to (4.7.2). The problem 
is reduced to showing that the OLS estimator of 6 obtained from (4.7.4) is 
identical to that obtained from (4.7.2). From (4.7.3) 


Niet Vs 

Y2— 9, 
SAS ae Qy a 

Yr—Ya 


where y; is the mean of the y, in the 7th quarter. The variables X, are 
similarly defined. Thus all of the deseasonalised variables have simply had 
the appropriate quarterly means subtracted in each period from the 
original variable. 

The OLS estimator of 8 obtained from the deseasonalised model (4.7.3) 
is 

b, = (X,X5)" Xeye 

or, substituting for y, and_X,, 


b, = (X'QX)!X'Qy. (4.7.5) 
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Using equation (4.7.2), the OLS estimator of 6 is obtained from 


alls sad [5 

= 4.7.6 
d Se. S'y ( 
From earlier results (Question 2.9), solving (4.7.6) for b gives 

b = (X'QX)!X'Qy (4.7.7) 


which is identical to equation (4.7.5) and completes our proof. 


Solution 4.8 


Part (a) The relationship between y and x is assumed to satisfy 


Vian Oy ae x+B1 ae Ut (x¢ <x, ) (4.8.1) 

Vt = OQ, + xB, + u, (Aya; t= Ls Gesr5' 72) (4.8.2) 
with E(u,) = 0, E(u,u,) = 07 (t =s), = 0(t #s). We also assume that 
Rg Ng has KA Sho Sr ANG thal WE ODSEIVE a ¢ 

We can rewrite (4.8.2) as 

Ye = (a, + y) + x, (6, + 5) + u;. (4.8.3) 
At the point of intersection, (4.8.1) and (4.8.3) are equal hence 

Oy + x48, = & + y+ x, (8, + 4) 
or 

y+x,5 = 0 (4.8.4) 


Equations (4.8.1) and (4.8.3) may be written compactly using dummy 
variables as 
Vt m8 cia x+By =F Diy 5 (D,x;)5 $e Ut (4.8.5) 


where D, = 0(t <A) and D, = 1(t > A). We can estimate (4.8.5) using 
restricted least squares, with (4.8.4) forming the linear restriction, (see 
Question 2.5). Alternatively, we can eliminate y from (4.8.5) using 
(4.8.4) to obtain 


Ye = at x8, + (Dix: = Dx, )6 + ut (4.8.6) 


Equation (4.8.6) can be estimated by OLS. The matrix of regressors has 
the form 
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1 ea 0 
1 a) 0 
PC} (el Sa 0 > 


1 xT Xp XA \ 


Part (b) If the relationship has two unrestricted disjoint segments with the 
break still at x4 , then equation (4.8.5) holds without the restriction 
(4.8.4) necessarily being satisfied. We wish to test for the validity of this 
restriction. One way of performing the test would be to estimate equation 
(4.8.5) and use the estimates of y and 6 obtained to estimate the 
restriction y +x 46. We would then test the restriction by testing whether 
or not this estimate was significantly different from zero. For further 
details see Question (2.5). 

Alternatively, we can use a likelihood ratio test (Question 2.3) which is 
based on estimates of equations (4.8.5) and (4.8.6). If L(Ho) is the 
maximum value of the likelihood function on the null hypothesis Ho 
[equation (4.8.6)], L(H,) is the corresponding value on the alternative 
hypothesis H, [equation (4.8.5)] and the ratio of these likelihoods is A = 
L(Ho)/L(H,) with 0 <A <1 then we use the result that —21n A has a 
limiting x? distribution with, in this case, one degree-of freedom due to 
the single restriction (4.8.4). (Theil, 1971, p. 396). 

If the u, are independent N(0, o*) then 


L(H;) = (2n67)"* exp(—aji;/26? ) 
—erOn ye ex ad) 
as 6? = u;u;/T, where denotes a maximum likelihood estimator and 7 = 
0, 1. Thus 
—2InAX = 2TIn(63/67) (4.8.7) 


which involves the ratio of the error variance of (4.8.6) to that of (4.8.5). 
We reject Hy if —2 In \ > x? (a), where a is our significance level. 

Since it was not specified in the question as to whether the break point 
in the unrestricted case occurs at x 4 we may wish to generalise the 
alternative to the case of an undetermined break point, Again we can 
employ a likelihood ratio test, but this time we must enlarge the 
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parameter space over which we maximise the likelihood function 
associated with equation (4.8.5) by adding a separate parameter x4. We 
can do this by choosing a value for x4 , forming equation (4.8.5), and then 
obtaining the error variance of the resulting equation. We repeat this for 
all possible values of x4 (x4 = x1, X2,..., Xp). We choose that value of 

xq and the corresponding equation which has the minimum error variance, 
for this will be the value which maximises the likelihood function. We 
compute our test statistic using (4.8.7) once more. This time, as we have 
dropped two restrictions on Ho, we have two degrees of freedom. 


Part (c) If a continuous linear model is the correct relationship between 
y and x, then y and 6 in equation (4.8.5) will be zero. We can test this by 
a conventional F-test applied to (4.8.5). 


Solution 4.9 
Equation (4.9.1) can be written in matrix notation as 
y = XBt+u (4.9.2) 


where X isa T x 1 vector, E(u) =O and E(uu') = © = o? diag {xj,..., xf}. 
The OLS estimator of 6 is 


bors = (CX) Xy =38 (AX) 
hence E(bozs) = 6 and the covariance matrix of borg is 
Vors = E(bors — 8) (bors — 8)’ 
= (X'X)" X’E(uu') X(X'X)"! 
COMED O24 0.0.4 am 
In the present case X'X = Tand X' © X = 0? DT x4. Thus 


Vos = o* Uxe |. (4.9.3) 


The generalised least squares estimator of B is (see Theil, 1971, p. 238 and 
Question 2.12) 


bers = (X'D1X) X'S ty 


with covariance matrix 


Vets = (X'D1X)? 
In this case, 
Vets = Clk (4.9.4) 


é 
‘ ¢ ¢ 
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We can now examine the efficiency of borg compared to bezs using the 
ratio 

V oT T. 

SS = ee Spam (4.9.5) 

Vous 0 oS ge Lx 


Using the Cauchy inequality, 


Day eor => (La;b;)* 
and letting a; = x? and b; = 1 we obtain 


pe (ae) 


or \ 
yep en S esd 


or 
Ie 


Thus the ratio (4.9.5) is < 1 and so GLS is at least as efficient as OLS. 
Suppose we assume incorrectly that L = 07/. en we would wrongly 
use the formula 


Vi = On(X Xa! 
to compute the variance of bozs. In this case 
Vie =O) Ts (4.9.6) 


This happens to equal Vgys obtained above but, of course, we must 
remember that (4.9.6) is an incorrect expression. Hence we see that 
comparing Vorts with V* gives us the same result as comparing Vozg with 
Vets: 1.€., 


V*/Vous = Vers/Vots <1 


Thus V* is a downwards biased estimator of the true variance of boys. 


Solution 4.10 


Part (a) Writing (4.10.1) in matrix notation we have 
y = XB+u (4.10.2) 
with X a T x 1, E(u) = 0 and E(uu') = © where (see Theil, 1971, p. 252) 
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1 p oe 
p il E=2 
= g? 
pt 1 


From question (4.9), the variance of b the OLS estimator of B is given by 
Warlb)o= 7 (XIX) TLX OX (XX) 4 (4.10.3) 
The question asks for the asymptotic variance of b, that is, for the 


variance of the limiting distribution of \/7(b — 8). From the answer to 
question (2.16) we find that this is given by 


’ pee AX Te ey Sa take 
lintdevar(>) Slim rei a lim ee (4.10.4) 
T — oo Too Too If T— oo Ie 
Now 

ls ap elo 

p 1 Th? 
x DEX = G2 (x45 Xp) 

pa i 


T T T 
= @2 (3 +2p >i XpXe-45 2Pt >, Neen ws + 2) 
ae t=2 


t=3 


Using the given assumptions 


lim T71X'DX = o2m,,(1 + 20a + 2920? +...) 


T-> 0 
= o2m,,(1 + pa)/(1 — pa) (4.10.5) 
From (4.10.4) and (4.10.5) it follows that 
o2(1+ pa 
Ee eee (4.10.6) 
T+ 0 MA por) 


which is our required result. 


Part (b) If «= p then (4.10.6) becomes [07 (1 + p”)] /[mxx (1 oe) . The 
efficient estimator of B is the GLS estimator b*. The variance of b* is 


given by 
var(b*) = (X'D 1X)! 


é 
‘ ¢ é 
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and the variance of the limiting distribution of \/T(bexis — 8B) is limp ~ 
T(X'D'X)"!. Using:the result, (Theil, 1971, p. 252) that 


i. oe Oo weed 

nate 1 =p soe 2 pe Gea 
lly iety bes ; oO ig Ne 

0 ‘ ‘ ee ee | 


we can show that 


Hence 


lim T71X' D4X = 072(1 —p?)" my, (1 Kp? — 2a) 


Too 
and so 
o2(l = p- 
lim T var(b*) = wee ACen (4.10.7) 
T > © ‘ita ast teak DO) 


If p =a then (4.10.7) becomes 02/m,.,,. Hence a measure of the 
asymptotic efficiency of b and b* is obtained by the ratio of the variances 
of their limiting distributions and is given by 


var(b*) OFA Lee Lipa 
ae =) 2 ee 2 
T+ var(b) OG; LS pe) jai. pay ~1tp 


which we were required to prove. 


Solution 4.11 


The Cochrane—Orcutt estimator of 6 and p is obtained as follows. First we 
combine (4.11.1) and (4.11.2) to obtain 


(Pye) + (ke PX 1 )8 =e; (Lees ee a) (4.11.3) 
This can also be written as 
(ye — %¢B) — P(ye-1 —%t-18) = @ (4.11.4) 


We now choose £ and p to minimise 23 e? using the following iterative 
scheme: 
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stage 1: set p = 0 in (4.11.3) and minimise © e? with respect to B 
alone. This is equivalent to using OLS on (4.11.1). Denote the resulting 
estimator by b(1). 

stage 2: set B = b(1) in (4.11.4) and minimise ¥ e? with respect to p 
alone. This is equivalent to replacing uw; in (4.11.2) by y, — X,b(1) and 
using OLS on the resulting equation. Denote the estimator of that we 
obtain by r(1). 

stage 3: set p = r(1) in (4.11.3) and minimise  e? with respect to B, 
obtaining 6(2). This is equivalent to replacing y, and x; in (4.11.1) by 
autoregressive tranformed variables and using OLS on the resulting 
equation. 

stages 4, 5,6... these repeat steps 2 and 3. Iteration is continued 
until the change in 2 e? is sufficiently small or until the largest 
percentage change in any estimate is sufficiently small. If x, is 
exogenous, in particular, if x, contains no lagged dependent variables, 
then stages 4, 5, 6,... are not necessary to obtain full asymptotic 
efficiency. Iteration can stop at stage 3. 


In order to derive the maximum likelihood estimators of B, p and o7, 
we must first construct the appropriate likelihood function. If we assume 
that uw» = 0, then the log-likelihood function is 


it ig has 
pC As are eae 8 ne 5 in 21 — 5 In o? Roy? 2 e? 
ig ie a 
= —<In2n——Ino?—-3/¥ (ye — oye 
- 2 t=2 
—xi6-+ 1-80)? + (91 — x18? | (4.11.5) 
since the Jacobian of the transformation from (e,,..., @7) to 
Vater aey pps , 
2 7 1 Hired 0 
=O | onl 8 amauta eB 
ee er) Ss ao es a 
d(y15 »yr) 
0 PRE a | 


Remark Without the assumption that wg = 0, the log-likelihood function 
is 


i® : 
L(GS ps 07343%) = —Ginan+$in |Z a 


—$(y1 —*18,..-, 97 —*7) 
= (y, — x18... ¥r — x78)’ 
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where 27! is defined in Solution (4.10). But 


1 =a 0 pearaa 
ae fs Te epaiiiitcs 0 
BIRLA aia be (0) —p 1+ p?_.. 0|=o07?RR' 
<< 
0 il 
where 
Wale 0. Orcas a0) 
= 1 0 WE 
p ve (see for instance 
R= 0 se eed Sey eee BS Dhrymes 1971, pp. 66—8) 


It follows that |>s'}=6 77 ; [R12 = G7 (1 =p") and hence the 
log-likelihood function can be written . 


(Ba peOo exo So Qn — >In OG) ain (1 p45) 


T 
apie b (Vi P¥e¢ eB Xe 00) 
t=2 


+ (1 =p) (91 — x40). 


Using this likelihood function to obtain the maximum likelihood 
estimators of 8, p and o? would result in slightly different first order 
conditions from those derived below and hence would produce slightly 
different numerical estimates. However, as the asymptotic properties of 
the two sets of estimators are identical, we shall work with the assumption 
that uo = 0. 


Cae . sae evil ais 

aB oe pe (Ky ePXeaa lee yeaa x1B) = 0 (4.11.6) 

Oe eee 

5p = Fi, Wer xesB) & = 0 (4.11.7) 
t= 

OT Je Po 

tee ann ee ae 2 — O 4.11.8 

00? 207 MU264 2 ¢ 


where €; = y+ — PYt-1 — x! + x;-,66 and * denotes a MLE. Equations 
(4.11.6)—(4.11.8) are k + 2 non-linear equations which can be solved for 
the unknowns B, 6 and 67. 
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Solving equations (4.11.6) and (4.11.7) for B and 6 we obtain 


al 


T 
B -| » (pan Oars) Ger DX py) 


ee? 
T 
» (1 = Bre) (94 — Byes) + 81001 — 18) (4.11.9) 
and i 3 
A T , ~ T , a TA 
p z| y (yeni ~ 38) D (ve-1 — %e-18) (Ye — 8) 
ce ie (4.11.10) 


Now the second term within braces in (4.11.9) is 0(7~') in probability 
and the first is 0(7~'’) in probability. Consequently, the asymptotic 
distribution of 6 does not depend upon this second term of (4.11.9) but 
only on the first term. (See Question (2.16) for further details of the use 
of orders in probability.) 

Dropping the last term in (4.11. 9) we can interpret B as the regression 
coefficients of y, — py:+, on x; — PX ¢— 1: Similarly, from (4.11.10), p is 
the regression coefficient of y, —x;8 on y;-, — x;-,8. But these are the 
same estimators as derived from the C—0 estimator. In order to maximise 
the likelihood function, equations (4.11.9) and (4.11.10) must be fully 
iterated. Thus, apart from the omitted term, a fully iterated C—O 
estimator is identical to the MLE. The limiting distributions of the C—0 
and the MLE estimators are identical, hence the C—O estimator is 
asymptotically efficient under the usual conditions (Solution 3.7). 

To see why the non-iterated C—O estimator, which stops after stage 3, 
is asymptotically efficient, we note first that b(1) a r(1) are consistent 


rir). of : and p, respectively. Further,,/7 [b(1) — B] and 
\/T [r(1) — p] possess limiting distributions and ie are O(1) in 
ee 


Consider now the estimator 
= , 
> [xy ly Meat) [Xe Peril} Ken | 
t=2 


y [ear par) Dr ed: 
Using ok 
Vie ell) A) oan eT aie pelo, tlh) tea 


we can show that 


VT [b(2) —B] 1 OY V;' eS [x¢ (Lees | [ws — #1) 447 4] 
where : gr 
Vy = 7 2 [x_ —7r(1) xe-1] [*t a) eet lie 
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But since plim 7(1) = p, the limiting distribution of \/7 [b(2) — B] is the 
same as that of 


PVA YS foe old) sean] [oe —r(1) te 


where V, = plim V,. Now 


T 
jhe du [x¢ —7(1) x¢-1] [we —7(1) ut-1] 
sipae aval Sree es [eevee 
t=2 
{uz — puz-, — [r(1) — p] ue-1} 
=> pny (x — PXe-1) (Ut — PUt-1) + D (4.11.11) 
t=2 


where D is a term of 0(T~'/”) in probability and the first term on the right 
hand side of (4.11.11) is 0(1) in probability. Thus.asymptotically, D can 
be ignored. A similar argument can be made if we use 6 and / instead of 
b(1) and r(1). Again we would obtain V, and (4.11.11) but now with a 
new D term which is also 0(T~'””) in probability. We have shown, 
therefore, that./7 (8 — B) and.,/T[6(2) — 6] have the same limiting 
distribution. We can also show that./7(6 — p) and,/T [r(2) — p] have 
the same limiting distribution where 7(2) is obtained from (4.11.10) using 
b(1) and 7(1). Thus it is not necessary to fully iterate the C—O estimator 
to obtain full asymptotic efficiency in this case. Full iteration is required 
when x, contains lagged values of y;. 


Solution 4.12 


Part (a) The Durbin—Watson statistic is defined as 


th p 
d=) CS es a ut (4.12.3) 
t=2 = 
where oii 
te = Ye — Xb (4.12.4) 


is the tth OLS residual and 6 is the OLS estimator of B. Multiplying out 
(4.12.3) we obtain 


=i (4.12.5) 


Introducing matrix notation for (4.12.4) and (4.12.5) we can write 
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a = y—Xb (4.12.6) 


at 
d = 2(1-———]+R (4.12.7) 


where u_; is & lagged once and R is O(T~') in probability. (See Question 
(2.16) for a discussion of order in probability). 
Writing (4.12.1) and (4.12.2) in matrix notation we obtain 


Apt (4.12.8) 
and 
u=— pUu-, +e (4.12.9) 
ona (4.12.8) and (4.12.9) into (4.12.6) we find that 
u = u—X(b—B) (4.12.10) 
= pu_, te —X(b — 8) (4.12.11) 
Pl pie gee (X= pX aq) (656) (4.12.12) 


Pre-multiplying (4.12.12) by u_, we obtain 


uu = pis wo Fae (X —pX-_,) (6—8) 
or 
i's u ii_1e _ uly (X 9X5) (b 5) 


wp ha 
alba: U-yU-} U-yU-4 


(4.12.13) 


t_,e i 
gM Soe AA 0 Riot 1) enh ower ems 4 


U_-juU_, U_-;U_-, 


Taking probability limits of (4.12.4), plim R = 0 and 


+R (4.12.14) 


2 plim T"!ui"_,e 
fon 2 = 201 8 ee 
ro el erm alia 

, spiny ek fies 1) plim (6 — B) 
plim T771ai1,u-, , 


(4.12.15) 


Now plim T7!a_,a_, = plim T™! i ‘a = limp..T'E(w'%). But from 
earlier results, E(u’) = (T — K) 02. Hence, plim T7!u_,u_, = 02. 
Further, from (4.12.11), 


Ciiey = € | pu-x 4 e240 p)| 
= peu, +e e-;—e€ X-|(0 ~B). (4.12.16) 
Successive substitution into (4.12.9) reveals that 
u = e+ pe_, + p7e_4 +. ~. 


é 
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but since E(e’e_,) = 0 for r > 0, we have E(e’u_,) = 0 and E(e'e_,) = 0. 
Thus 


1 pa 
plim —e’#_, = plim ——. plim (6 —8) (4.12.17) 
je ip 
Finally, we note that X is fixed and 6 is consistent for B, hence (4.12.17) 
and the last term of (4.12.15) are zero, implying that plim d = 2(1 — p), 


as required. 


Part (b) When p = 0, plim d = 2, Hence we require the limiting 


distribution of \/T (d — 2). For p = 0, (4.12.14) simplifies to 
xX 
oh ee peepee pe erlang Se (4.12.18) 
UU U_,u-y 


or, using (4.12.10) and (4.12.16), 


Fy Pa ee 2 TSI) oe 
U-yU_-4 uU_-,Uu- uU_,uU_-4 
+R . (4.12.19) 


By Cramer’s theorem, (See also Question 2.16) and using (4.12.19), w 

deduce that ,/T (d — 9) ) converges to the same limiting distribution as 

—20°?T~'e!,e. For, even when multiplied by ./T7, the term in curly 

brackets has a probability limit of zero and hence can be ignored. 
The limiting distribution of T~!?e",e is N(0, o*) since 


T 
EP ee) = 1) a Eee.) U 


ip a ) ( He 


ey Ble?) £(@-»)| 2 


Thus the limiting distribution of ,\/T(d — 2) is N(0, 4). 


and 


Var (ly */2e_ ye) 


II 
— iw 


Solution 4.13 


Part (a) In this case the model becomes in effect 


Vi eae (4.13.2) 


Provided T, > K we can estimate (4.13.2) by OLS. The resulting 
estimator b, is clearly unbiased for B and has the covariance matrix 
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Va ete (x X,)7 (4.13.3) 


Part (b) The model now becomes 


5 XxX, Uy 
a y pa p+|"| (4.13.4) 


with X, = 0, 0, the OLS estimator of B is, therefore, 
by (XX. + X2Xq) 1 (Xiy1 + X92) 
= (X1X1) 7X11 (4.13.5) 


which is the same as b, and has the same covariance matrix. 


Part (c) In this case the model reverts to (4.13.1) with y3 = 0 and X, = 0. 
b3 the OLS estimator of 6 is 


by = (X1X1 + X2X2 + X3X3) 1 (Xiy1 + X22 + X33) 


= (XX, + X3X3) 1 X1y1- (4.13.6) 
To find the properties of b3 we substitute in (4.13.6) for y, obtaining, 
bs = (X4X, + X3X35)7! (X,X,8 + X1u1) (4.13.7) 
hence, 
E(b3) = (X1X1 + X3X3) 1 X1X46 
= B— (KG + X3X3) 71 X3X38. (4.13.8) 
Thus 6, is a biased estimator of 8. The covariance matrix of 63 is 
Vz = 07 (X41 F XgX3) XX (XX 1 X3%3)7 (4.13.9) 
Part (d) The log likelihood function of (4.13.1) satisfies 
E, i eee 
Iso = eonst. = ne Fa 
2 20 
1g. %; : 
= const. amo S| (Vy Pp) (Nye ab) 
Z 20 


+ (y2 —X2B)'(y2 — X28) + (y3 — X3B)'(ys — X38)] 
Treating 0? as known and differentiating with respect to the unknown B, 
y3 and X, we obtain 


aL Lx y B+ XXb + X4Xa6 
ae (X,X,6 + X,X28 + X3X3B) 
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Bbe fehl 


ee — 5 (93 — X38) = 0 (4.13.11) 
and 
OL Let ey - 
aX, = ~ G7 (X2 6B apes ai: (4.13.12) 
To obtain (4.13.12) we have used the result (Theil, 1971, p. 32) 
Op 
54 Az) = yz (4.13.13) 
and also 
' ! IX OG 
a(y'X'Xe) _ O[(Ky)'Xe] , U(X" oy aa g.14y 
aX aX ax 


where y and z are column vectors and A and X are matrices. 
Pre-multiplying (4.13.11) by X3 gives 
Xo)s eee . (4.13.15) 


and hence the terms with subscript 3 drop out of (4.13.10). Since 
(4.13.12) implies that 


va = Xap, (4.13.16) 


pre-multiplying (4.13.16) by X% reveals that terms with subscript 2 also 
disappear from (4.13.10). The maximum likelihood estimator of B is thus 


eae eye (4.13.17) 


which is the same as b,, the OLS estimator of B obtained from using 
observations on y, and X, alone. 

If o* is unknown we must estimate it. Differentiating the likelihood 
function with respect to 0” we obtain 


OL eoure sts by : 
ye AT to + 40-7 [yy Bae iyi a P) 
+ (y2 —X2B)'(¥2 — X28) + (93 — X38)’ (93 —X8)] 
= 0 
But y, = X,8 and 3 = X36, therefore, 
OMT TMM Gy) AONE G0) 
which is the residual sum of squares obtained from the regression of y, on 
X, divided by T. 
It is clear, therefore, that of these estimators of 8, the best is the OLS 
estimator obtained using only the complete observations on y and X. 
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Solution 4.14 


Part (a) Equation (4.14.1) is sometimes referred to as the covariance 
model and is frequently used to pool cross-section and time-series data for 
a linear model which has an intercept that varies both with cross-sections 
and time periods. To analyse the covariance model we first introduce 
dummy variables and then express the resulting model in matrix notation. 

Introducing dummy variables for the intercept shifts of equation 
(4.14.1) we can write 


Vie = Oozap +... t+ Onzne t+ bwin +... + bpwir + xieB + uit 


where 
Zit = 1, for thezth cross-section (¢ = 2,...,N) 
= 0, otherwise 
wit = 1, forthe tth time period (t = 2,...,T) 


= 0, otherwise 


Using matrix notation (4.14.3) becomes 


y = Z0+Wot+ XB +u (4.14.4) 
where 
Yi 1 1 
1 1 
¥12 0 0 
y — . ’ Z — @ ° ’ W = ‘ ® if ’ 
; Ii -1/N x (N-1) , ; a TX (T-1 
YNnT NT X1 Liny 4 Vinx 
x u 
11 11 0, o 
U2 
MS ; NE i= : 0 = v) = 
' On (N-1)X1 or (T-1)X1 
*NTINTXk UNTINTX 1 


with E(u) = 0 and E(uu') = 0? Iyr. Thus the covariance model is seen to 
be a conventional linear model. As such 0, ¢ and B are estimable by OLS. 


Part (b) From equation (4.14.4), the hypotheses Hy to H, are clearly seen 
to involve the usual F-tests for sub-sets of coefficients. These are derived 


in question (2.3). 
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(i) The test statistic for Ho: 9 =0,¢=0,6 #0 against Hy: 0,6, B #0 


1S 
as NPS Nap kt 
Ppl SSI eas SA yl ascetic (4.14.5) 
RSS, N+T—2 


where RSSo is the residual sum of squares on Hp and is obtained 
from the regression of y on X; RSS, is the residual sum of squares 
on H, and is obtained from the regression of y on Z, W and X. The 
degrees of freedom are the number of observations less the number 
of estimated coefficients. 

(ii) The test statistic for H,: 6 = 0,¢, 6 #0 against Hy is 


RSS, —RSS, NT-N-T-k+2 
RSS, N-1 


where RSS, is the residual sum of squares on H, and is obtained 
from the regression of y on W and X. 
(iii) The test statistic for Hp against H, is 


_ RSSy—RSS, NT-T—k+1 


is (4.14.6) 


(4.14.7) 
RSS, Niet 
Part (c) 
(i) Expressed in matrix notation, (4.14.2) is 
y = 26+ XB+u (4.14.8) 


where all variables are defined as for (4.14.4) but with @ = 0, x% = 
(1, x), N = 3 and T = 4. The OLS estimators of 6 and B are 
obtained from the normal equations 


ube Sart (4.14.9) 
RZ, SCZ X'y 


Substituting for Z, X, N and T we obtain 


43 0 4 2X2 0, DY 2 
0 2 4 2X 3¢ 0, Ly 3t 

i SN (4.14.10) 
4 a 12 DX it By DYit 


2 
UX 2X 34 LXip UXH 2 Lx ityVit 
it it it 
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2 es ee ames Be a 59 
0 4 4 14]1]6, 49 
. |= (4.14.10) 
pee: ams ee Oe 3 Weevil 
ot 14°40 182 | 18, 539 
Hence, the required OLS estimates are 
6, 1.4697 0.7955 ONe3 0 © 0.2424 DY 
6, ry 0.7955 0738068 .—0:0795 —0.1364 49 
By 0.0530 —0:0795 0.342 79=-O.0758' pls? 
B, —0.2424 —0.1364 —0.0758 0.0606} | 539 
2029 
2.07 
= (4.14.1) 
5.62 
1.30 
(ii) The test statistic for Hp: 8, = 03 = 0, By, B, #0 against Hy: 05, 63, 
Bi, B2 #0 is 
od. otal me gk hed 
RSS, Ni 


with N = 3, T=4 and k = 2. Now 


RSS, d yi A (42, 63, B1,B2) (5 V 2t» d Yat» 2, Vier =o) 
i t i i 


1713 — (2.29, 2.07, 5.62, 1.30) (59, 49, 137, 539)’ 


= 1713 —1708.77 = 4.23 


RSSp is the residual sum of squares of the regression of y;, on Xj 
and a constant. Thus 


oy 
NT DXit DYit 


RSSo = y yi -(5 Vit» E su) 
it it i 


2 
ioe Dx 
wy aE 


12 40] [137 
1713 — (137, 539) 
40 182] 1539 


UXit Vit 
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182 4 | 137 


[1554 
—40 12) | 539 


1713 ~ (137, 89)| 


s-1713 41703377 = "9503: 
Substituting these figures into the F-value we obtain 
7 9.03 4.29 2 SiS See 5.40x8 _ Pai: 
4.23 a 4:23:-% 2 


Now F is distributed as Fy, g. But Fz, g(0.05) = 4.46 and F,, g(0.025) 
= 6.06. Therefore, at the 5% level of significance Hg is not rejected 
but at the 2.5% level it is rejected. 

\ 


Solution 4.15 


Part (a) Like the covariance model, the error (or variance) components 
model (Wallace and Hussain, 1969; Maddala, 1971; Nerlove, 1971 and 
Henderson, 1971) is used in pooling cross-section and time-series data. 
But, unlike the covariance model, it is assumed that the error term 
consists of three independent components: a cross-section term u;, a term 
associated with time v;, and a general term w; which is common to both 
the cross section and the time series data. Thus, in the covariance model, 
the three error terms u;, v; and w;, can be thought of as having different 
means but the same variances, whereas in the error components model 
they have the same (zero) means but different variances. 

It has been argued, however, that a major disadvantage of the 
covariance model is that the intercept dummy variables eliminate a large 
part of the variation among both the dependent and independent variables 
if the between cross-section and between time-period variation is high. 
Further, it is often difficult to interpret the dummy variables whose 
inclusion can also use up a large number of degrees of freedom. For these 
reasons the error components model is often the preferred model for 
bringing cross-section and time-series evidence to bear on a model. The 
relationship between the error components and the covariance model is 
discussed further in Question (4.16). 


Part (b) 
(i) Writing (4.15.1) and (4.15.2) in matrix notation we have 
y = XBt+e (4.15.3) 


where 
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Uy 
Dag X11 ei 
' 
vite X42 C12 
ee tien ted eh)’, and e=Zu+w 
' 4.15.4 
YNT XNT eOnT ( ) 
where uw’ = ( ‘= 
u Hay wus tins Us sis xi Or.) =\(Wi is W195+<05 ONT) ZL = 


(Ivy @ ip :iy @ Ip) andi, is an A x 1 vector of ones. 
It follows from the assumptions on u;, v; and w;; that e is N(0, Q) 
where 


Q = Elee') = ZE(uu')Z' + E(ww') 
SG ( ZAC! VF Tyra) (4.15.5) 
and 
A = 07) E(uu’) 
a Cols | 0 
= 92? - Blt | (4.15.6) 
0 | Or 


If 02/02, and o? /o2, are known, then 2 is known up to a factor of 
proportionality and hence GLS can be used to estimate (4.15.3). That is, 
if 

SL = 02 |” 
then the GLS estimator is 

Bea At X Me hy (4.15.7) 


(ii) If 02 /o2, and 02 /o2, are unknown then Q* is unknown too and the 
above estimator cannot be used or, at least not directly. One alternative 
approach is to use maximum likelihood estimation but this can be 
computationally fairly burdensome, as it involves the solution of non- 
linear equations. 

A simpler alternative is to use a two-step estimator in which in the first 
step an estimate of B is obtained which is used to estimate u and hence 
o2 /o2,, 02/02, and 2”. This latter estimate can be used in (4.15.7) instead 
of the unknown ”* to provide the second step estimator of 6. We shall 
describe this estimator in further detail. 

A possible first stage estimator of B is OLS on (4.15.3). The residuals 
of this equation are an estimator é of e, i.e. e = y — Xb, where b is the 
OLS estimator of 8. An estimator of uw is now obtained through replacing 
e by é in equation (4.15.4) and regressing é on Z. The resulting estimator 
of u is 
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fA a 


' AW) ype A A A 
Now u’ = (uy,.-Un,V1,--Up) andu = (uy,..Uy,v,,-.U7). We can, 
therefore, estimate o2 and o? by 


62 ead Ei) 


ONG 
and 
: DAVE phe 
appl —be 
respectively. Finally, we can estimate 62, by a2, = w'w/NT, where w is 


the residual vector w = e — Zu. We can‘now obtain estimates of 02/02, , 
o2 /o2, and hence of {2* as required. 


Solution 4.16 


Part (a) S 
(i) If w; and v; are constants then (4.16.1) is the covariance model 
discussed in question (4.14) with the modification that in (4.16.1), 
in contrast to (4.14.1), the N cross-section and T time-series effects 
are all included explicitly. 
(ii) If the uw; and v,; are random with the properties given, then we have 
the error components model of question (4.15). 


Part (b) For either model we can write (4.16.1) in matrix notation as 


y = 26+Gu+HAHvt+w : (4.16.4) 
where 
Vii LX a 
Y12 Lie ters 
— 5 ) Wl, 5 5 ’ G=Iy @ ip, H=iy@ Iy, 
YNT. ] xXNT 
W441 
W 12 a ms a 
w= , O= | } “= , v= 
B 


u vy 
WNT N 
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We shall consider first the error components model. Denoting the 
disturbances of (4.16.4) by €, where 
€ = Gut+HAvt+u, (4.16.5) 
we obtain E(€) = 0 and 
E(ee') = Q = 03, In7p + 02GG' + 02 HH’ 
= 0 Iy7 + 02A+02B (4.16.6) 


where we have used the fact that wu, v and w are independently distributed. 
A and B are as defined in the question. We note that { satisfies equation 
(4.16.2). 

As 02,, 02 and o? are known, the best linear unbiased estimator of 6 is 
GLS giving 


SZ a Z\ ZO (4.16.7) 
with covariance matrix 
SE AAG AA 


a | 
w -17: | api —4 
wr sip aise ci (4.16.8) 
Oa een Oly 


where Z = (iy: X) and X' = (xi1,.-., yr). 
Using the result (4.16.2), we obtain 


inp Q'X = oNine (nr — 14 — 2B + Y3Inr)X- (4.16.9) 


Each column of X sums to zero, hence iy 7X = 0. Moreover, Jy 7 = 
incing,tn7tA = Tiny and iy, B = Niyy. Therefore, t'y7{71X =0 and 
so (4.16.7) is block diagonal. Consequently, 


@ = (iypQT ivr) linc Qy 
and 
Ba aS ag Ld gh (4.16.10) 
The covariance matrix of B is 
Vg = (XQ) XY! 
= 02 [X'(UIwr —114 —128)X]", (4.16.11) 


where we have used the result derived above that Jy 7X = 0. 
Turning now to the covariance model, the best linear unbiased 
estimator for (4.16.4) is OLS giving 


ee EXERCISES IN ECONOMETRICS 


d —1 
Ake pe AY iD Wis 
a|= | id (4.16.12) 
7 F'Z F'F F'y 
Uv 


where F = (G: A) and hence is identical to that defined in the question. 
From Question (2.7) we obtain the OLS estimator for a sub-set of 
coefficients as 


d = (Z'QZ)'Z'Qy (4.16.13) 


where Q =Iy7 — F(F'F)'! F' and hence satisfies (4.16.3). Furthermore, 
the covariance matrix of d is 


Va = 9% (2 QZ)" 
-1 
2 ie intr QX 
"NX Ones ON 


\ 


(4.16.14) 


Given the definition of Q, we can use our earlier results to show that 
In7pQX = 0 and hence V4 is block diagonal. Consequently, the OLS 
estimator of 6 is 


b = (X’QX)' X'Qy (4.16.15) 
with covariance matrix 
Vy, = 02, (X’QXYy! 
Using the definition of Q and the result Jy 7X = 0 we obtain 
=I 
i 


1 
Vy, = 03, [x i -74-y8)] (4.16.16) 


To prove (i) we note that if A and B are positive definite matrices and 
if A — B is positive semi-definite, then B~! — A“! is positive semi-definite 
(Goldberger, 1964, p.38). 

We are asked to prove that V, — V is a positive semi-definite matrix 
and by the result above this is satisfied if Vj! — Vj" is positive definite. 
But from (4.16.11) and (4.16.16), 


- 3 : ; 1 1 


2 “x7 —n,)4 + Lyons (4.16.17) 
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Now 
petits aga ip 02, 
TeV MRR Ota foe yy Moir + To3) 
and 
1 oo 
nN? 7 N(o3 + No?) 


which are both positive. Hence (4.16.7) is positive semi-definite, our 
required result. 

To prove (11), we require the covariance oe of the limiting 
distributions of VN T(6 — B) andV NT (b — B). From (4.16.11) and the 


above results on y, and y, the former is 


lim NTVzg= lim of 


-1 
: ey 918, 4 XB 
=e aoe A mn 


N,T>© N,T?& NT NT NT 
; ; Ae, Xe DX on xX AX 
= iim: 900 ee ene ee 
N,T7+% NI NF? GG MN Fn oe We peta AY Bg 
: -1 
o2, oo 
Ga aNOF NT] 
(4.16.18) 
and, from (4.16.16), the latter is 
-1 
li NTV, li 2 zee x23) 4.16.19 
im = hn 6.1. LO 
N,T>& rit Nie Ng als ( ) 


If N and T > © such that N/T is constant, then, given limy, p+. X'’X/NT 
is finite and non-singular, both (4.16.18) and (4.16.19) can be shown to 
reduce to limy 7 + Ow (X'X/NT)"'. Thus, in this case, the two estimators 
B and b are asymptotically equivalent. Despite the remarks made in 
Question (4.15a) about the advantages of the error components model 
interpretation over the covariance interpretation, this result suggests that 
the covariance interpretation might be preferred. As the estimation 
procedures provide asymptotically equivalent estimators of B, we would 
choose between them on the grounds of computational ease. In this case, 
as the covariance estimator is an OLS estimator and the error components 
estimator is a GLS estimator, typically with unknown error covariance 
matrix, we would choose the covariance interpretation. A comparison of 
small sample properties may, however, lead to a different conclusion. 

It is important to note, however, that in general the two estimators B 


é 
Pa é 
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and b will not have the same asymptotic distributions. If N/T is not 
constant as N and T > ©, or if either N or T > ©, then (4.16.18) and 
(4.16.19) will not be equal. For example, if N is fixed but T > cin sucha 
way that limp... X’X/NT, limp... X'AX/NT and limy ... X'BX|NT are all 
finite then (4.16.18) becomes 

-1 


! ! Gi X'BYX 
eer] pecs eee ate FOES (4.16.20) 
roe “(NT N?2T 02, +No2 N?T 
and (4.16.19) becomes 
MX IX BX 
lim 03, xx oe | ‘ (4.16.21) 


The difference (4.16.21) — (4.16.20) is a positive semi-definite matrix. In 
this case the two estimators $B and b do not have the same asymptotic 
distributions. 


Solution 4.17 


Part (a) Aggregating (4.17.1) over the N micro-units we obtain 


N iy aT Ne, N 
i MiG = YS, x32B; + > eae \iale 2, Uie 
=] i=] ne 


i=1 
or, 

Vy = Xb) 27 as (4.17.3) 
where the jth element of 8;, Bij = 2 ixitjBijz/Lixiej (xizj is the jth element 
of the column vector x;;), Y = y and u; = DpLy uit. 

In other words, whereas the macro-coefficient Y of Z; in (4.17.3) equals 
the micro y and corresponds to equation (4.17.2), the macro-coefficient 
B, of x; in (4.17.3) does not correspond to B in (4.17.2). As indicated by 
the subscript,t, 8, depends on time and is not, in general, constant as is 
required if (4.17.2) is to be a correct macro-representation of (4.17.1). 

B; is seen to be a weighted average of the micro-coefficients B;, with 
weights equal to the corresponding x;;’s. In other words, 8, depends upon 
the distribution of x;,; across micro-units. A change in this distribution 
from one time period to another will usually alter the weighted average of 
the £;’s. It follows that if the distribution of x;,’s is constant over time 
then (4.17.2) will be a correct macro-representation of (4.17.1). 

Also, if B; = B for all 7, i.e. the micro-units have identical 8;’s, then 

= B and hence (4.17.2) becomes a correct aggregation of (4. 17% 1) with 
a= B. In this case, the distribution of the x;,’s is irrelevant. oe this 
condition is met for the term z;;¥: 


cat 
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Part (b) If 6, = B then Vv, = U, and, since E(u,) = UN, E(uj;,) = 0 and 
E(ui,u,) = E(2,uje)(2h1 uj,) = No? for t =s and zero for t #s, we can 
estimate (4.17.2) efficiently by OLS. 


Part (c) When (4.17.2) is an incorrect aggregation of (4.17.1) it represents 
a mis-specification; from (4.17.3) it can be seen that v; in (4.17.2) satisfies 
v, = *:(By —B) + a (4.17.4) 


and not u,;. From question (4.3) we know that the bias in estimating 
(4.17.2) by OLS will be 


2 |-[']- es a DE(R 0) 
c 7 FIX are DE(E.n,) 


where b and @ are the OLS estimators of B and ¥, respectively, 
NOM sn ees xy pand 2) = (2750), Zr). Thus the bias is 


6) [8B RX RZ) | Dee (be — 8) 
| ee oe = 27 : - (4.17.5) 
2 z X1(B; ae) 
In general, therefore, both 6 and é will be biased estimators. 


Part (d) The test we require is Hy): 8, = 8, =... By = B against A, :B; 
not all equal. This is a conventional test of the equality of a sub-set of 

coefficients in different equations (see Question 2.6). On Hg, we write 
equation (4.17.1) as 


Yi X11 211 Uji 
Vie X12 212 U42 
- — | + 
D4 
YNT XNT ‘NT. UNT 
or 
y = WOt+u (4.17.6) 


where E(u) = 0 and E(uu') = o*Iyr. On Hj, equation (4.17.1) can be 
written as 
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By 
Ra eee meee ow Z41 B, 
are 0 X12 212 sa 
0 XnT NT Pv % 
ey: 
or, 
y = WO” + x. (4.17.7) 


Our test statistic is, therefore, 
_ RSSp —RSS, NT—Nk—! 
RSS, k(N - 1) 


where RSS, and RSS, are the residual sums of squares of (4.17.6) and 
(4.17.7), respectively. F is distributed as an F’,(y-1), nT-Nk-1- For an 
alternative approach to this test see Question 3.12(bii). 


Solution 4.18 


Part (a) We wish to test whether or not the coefficient of In W in equation 
(4.18.1) is significantly different from zero. In the absence of any figure 
for the number of observations we shall use the limiting normal 
distribution for our test statistic. This is 0.811/0.051 = 15.9, which is 
highly significant. We may conclude, therefore, that this evidence supports 
the hypothesis that variations in output per man in the iron and steel 
industries of different industries are explained by differences in the money 
wage rate. Equation (4.18.1) suggests that a 1% rise in money wages leads 
to a 0.81% rise in output per man. 


Part (b) Denote the omitted variable by X. Then, if this variable enters 
linearly, the modified equation can be written as 


n® = 6, +6, mnW+p,X+u (4.18.2) 


with B,, 6, > 0 and cov(In W, X) > 0. If (4.18.2) is correct, the OLS 
estimates of (4.18.1) will be biased. Assuming X is fixed, the bias of b, 
the OLS estimator of B, in (4.18.1) is (see Question 4.3) 


_ B,cov(In W, X) 
var(In W) : 


which is clearly positive. Thus the estimate 0.811 can be gee in this 
case to overestimate B,. 


E(b,) — By 


CHAPTER 5 


Further stochastic models 


0. INTRODUCTION 


Many of the models discussed in earlier chapters differ from the type of 
model we may wish to use in practice in two respects. In the first place, 
they require the equations relating the observable variables to be linear in 
both variables and parameters. Second, they are limited by the condition 
that the explanatory variables in an equation are all exogenous. In the 
present chapter, our questions deal with a number of different stochastic 
models which relax one or other of these requirements. Most of our 
attention is devoted to non-linear regressions and models with errors in 
the variables. 


1. QUESTIONS 


Question 5.1 
In the model 
TRAC, ) tt ef — 13 T) (5.1.1) 


y, is a vector of n observable random variables, u; is a vector of random 

disturbances and g;(a) is a vector of known functions of the unknown 

p x 1 parameter vector a whose true value is denoted by a. The elements 

of g;(a) also depend on a vector x; of m non-random exogenous variables. 
The estimator a(S) is defined as the vector a which minimises the 

quadratic form 


2 Lye — g2(@)]' S [ye — 84 (2)] 


where S is a positive definite matrix. Discuss the conditions under which 
Qp (S) is a consistent estimator of a. 


(Adapted from University of Essex MA examinations, 1973.) 
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Question 5.2 


In the model (5.1.1) it is assumed that the disturbances u, ({ = 1,..., T) 
are serially independent, identically distributed normal vectors with zero 
mean vector and positive definite covariance matrix a 

(a) Show that the maximum likelihood estimator of a minimises 


~ 


In der| t : [ye —9e(@)] [ye — ge (@)] 


considered as a function of a. 

(b) Show that the maximum likelihood estimator of a can be regarded as a 
minimum distance estimator. Does this mean that we can deduce the 
asymptotic properties of maximum likelihood estimators in models such 
as this from the corresponding properties of minimum distance estimators? 


Question 5.3 


(a) In the context of the Poel (5.1.1) discuss the asymptotic sampling 
pes of the estimator a(S; ) that minimises _ 


> [ve — ge(@)]' Sr [yt — 84 (0)] 


with respect to a, where Sp is a random symmetric matrix which tends in 
probability to a positive definite matrix S. State clearly any assumptions 
made about the functions g;(@) and the disturbances u;. 

(b) Demonstrate that ap (S 7) is asymptotically efficient in the class of 
minimum distance estimators when the matrix $7 tends in probability to 
{71 , the inverse of the covariance matrix of the disturbances. 


Question 5.4 
(a) In the model 


Y, = 0x, + (exp a°) x, + uy (eae hye gt) (5.4.1) 
the y,; ({ =1,..., 7) are observable random variables, the x;; are non- 
random, bounded quantities whose second moment matrix is non-singular 
and tends to a non-singular limit as T > e and the u,(t =1,..., T) are 


independent and identically distributed disturbances with mean zero and 
finite variance. 
Show that a, the estimator of w° which minimises 


d (Neg Ose Seta 


is a consistent estimator of a°. 
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(b) If the model (5.4.1) is replaced by 
ye = a +(exp a) t+ u; (sheesh) (5.4.2) 


is & still consistent? Prove your result. 


Question 5.5 
An econometric model is described by the equation system 
v2, = A(a°) x, + uy aa Merantre bf) 


where y; is an observable random n-vector, x; is an observable non-random 
m-vector, u; is a vector of disturbances and A (a°) is an n x m matrix 
whose elements are known functions of the unknown p-vector of 
parameters a°. 

The matrix A (a°) is first estimated by an unrestricted least squares 
regression and we denote by A® the resulting matrix of regression 
coefficients. The minimum distance estimator a** is now defined as the 
vector which minimises 


i x [ye — A (x) x] (Miu) [ye — A (2) xe] (5.5.1) 


where 


T 
Mia Dvr — AR) (9 — A*x) 


t=1 
(a) Show that a** minimises 
tr{[A* — Ac]! (My)! [4* —A(@)] Mee} (5.5.2) 
where 


Jk 
— 7-1 Y 
ioe ao ys XeXt 
t=1 


(b) List a set of assumptions under which \/7(a** — a°) has a limiting 
normal distribution and write down the covariance matrix of this limiting 
distribution. 

(c) Show that, if the assumptions in (b) are satisfied,,/T [A(a**) — A(a°)] 
also has a limiting normal distribution. 

(d) If y** represents the covariance matrix of the limiting distribution 
(vee [A (a**) —A(a°)] and y* represents the covariance matrix of the 
limiting distribution of \/T vec [A* — A («°)] show that the matrix 

w* — y** is positive semi-definite. 


Question 5.6 


The observable random variables y; (t = 1,..., 7) and non-random 
quantities,x;(¢ = 1, 2;t=1,..., 7) satisfy the relation 
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Ve = AX yp + AgX 24 ue 


in which u; is distributed independently of t with mean zero and variance 
o? and u, and wu; are independent if s #t. The parameters a, and a, 
satisfy the restriction aj =a. The x; are bounded and the moment 
matrix 


is non-singluar and tends to a non-singular matrix M,,. as T tends to 
infinity. 

(a) Describe an iterative procedure for obtaining estimates a, and a, that 
satisfy the restriction 4? =4, and are such that,/T (a, —a,) and 

\/T (@_ — a2) have limiting normal distributions whose variances are at 
least as small as those of the limiting distributions of \/T (aj —a,) and 
\/T (aj — a2) respectively where aj and a3 are the unrestricted least 
Squares estimates. 

(b) Compare the variances of the limiting oie cas of,/T (a, —4a;) 
and./T (aj —a,) when a, = 0.5, a, = 0.25, 07 = 1 and 


(University of Auckland MA examinations, 1969.) 


Question 5.7 

In the system 
Mie = yp X yp + AyaXo¢ + Ue (5.158) 
Yar = AgiX ie + AgaXo¢ + ure (5.7.2) 


the yz (¢ = 1, 2; 4 =1,..., 7) are observable random variables, the 

xe(@ = 1, 2;¢= . , T) are observable, non-random, bounded quantities 
oe second gin matrix converges as T > © to the finite, positive 
definite matrix M,,.. The uj (¢ = 1, 2) are serially independent random 
disturbances which have the same bivatiate normal distribution for each 
value of t with E (uy,) = ES = 0 and second moments given by 
E (uj,) = 207, E(uy,uy) = 0? » E (ue) =o? for all t, The ag are unknown 
parameters which satisfy the restriction 
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411442 = 1 (537.3) 


(a) Briefly outline a procedure for obtaining asymptotically efficient 
estimates of the coefficients ay(z, 7 = 1, 2). 

(b) Compare the covariance matrix of the limiting distribution of your 
estimates with that of the least squares estimates when the true values 
of the parameters are a4, = 1, a4. = 1, a.; =1,a,, = 2 and 


Question 5.8 


Two unobservable economic variables Y; and X;(t = 1,..., 7) are 
assumed to be related by the equation 
Youn + BX, (eo Ae 8 (5.8.1) 


in which a@ and 8° are unknown parameters. Observable variables y, and 
x, are known to be related to Y; and X; according to 
= Y,+u One 
yt t t G2 TT) ( ) 
Ke l= UX eth vy (5.8.3) 
where wu; and v; are serially independent random disturbances which are 
distributed independently of t and of Y, and X,(s =1,..., 7), with 
zero means and second moments given by 


E(u?) = 07, E(u?) = 07, E(u,zv,) = 0 for all t. 


(a) Show that, under certain conditions, an orthogonal regression of the y, 
on the x; yields consistent estimates of a and 8°. Derive an explicit 
representation of these estimates in terms of the sample moments of the 
y,_ and x;. 

(b) Find the orthogonal regression estimates of a and f° given the sample 
means y = 100, x = 25 and the following sample second moment matrix 
in terms of deviations from means: 


ay NS 
IN Our 
INE SIN Es 


Question 5.9 


The observable random variables y;;(t = 1,..., 7) are assumed to be 
related to the unobservable non-random variables y3;(t = 1,..., T) 
according to the relation 
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Vie = Va V+ Ue (5:91) 


where y is an unknown scalar parameter. Observable random variables 
yo(t =1,..., 1) are known to be related to the y3,(¢= 1,..., 7) by 
the equation 


Yor = Var + Ure (5.9.2) 
and the y3; are determined by 
Nor ep (tb = Ieee a) (5.9.3) 


where x; is a vector of k non-random exogenous variables and 8 is a vector 
of unknown parameters. In (5.9.1) and (5.9.2) the uj, are random 
disturbances whose first and second moments are given by: 


E (ig Eluy) = 0p ele 1h 
E(u) = 


E (w3+) = Woy 


| 
= 


E (ujyuj) = 0 for all 1,7,t ands satisfying? #j ort #s. 


It is further assumed that the limit as T > © of the matrix 


T 
MM,» Ie y X ¢X+ 

t=1 
exists and is positive definite. 
(a) Show that a consistent estimator of y is obtained from the regression 
of y4¢ ON Ho where 9; is the calculated value in the regression of y,,; on 
Lae 
(b) Can you suggest any improvements in the procedure described in Part 


(a). 


Question 5.10 


In the vintage production function 
t 


aN 
=e 


OG) = Be Oe nit)? / e*1(0) do] 5) Ok Oat 


—oo 


(5.10.1) 


Q(t) represents gross output at time t, L(t) the total amount of labour 
employed at time ¢ and /(v) real gross investment at time v. B, a, 6 and A 
are unknown parameters. The parameter \ measures the proportional 
rate at which new technical knowledge is being embodied in new capital 
and is called the rate of embodied technical progress. 

It has been suggested that A can be estimated from the equation 
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In eS + sn (0| fro} = In(B) + E = (5.10.2) 


where 


R(t) = [Q(e)/L(t)*] "9 
by using extraneous estimates of a and 6 to construct a time series for 
(AR + 6R)/I where AR is a first difference approximation to the derivative 
dR(t)/dt. 
(a) Indicate how the estimating equation (5.10.2) can be derived from 
(521051). 
(b) How satisfactory is this procedure for estimating A? Can you suggest 
any improvements? 


2. SUPPLEMENTARY QUESTIONS 


Question 5.11 


The observable random variables y;(¢ = 1,..., 7) and non-random 
quantities x; (2 = 1, 2,3; =1,..., T) satisfy the relation 


Vet = AX sF A4X 35 A3X 34 ar Uz (5.11.1) 


where 4; is a serially independent random disturbance with the same 
distribution for all t in which E(u,) = 0 and E(u?) = o?. The parameters 
a, anda, are known to satisfy the restriction 


a;a, = 43 (5.012) 


It is assumed that the x; are bounded and the sample second moment 
matrix of the x, is non-singular and tends to a non-singular matrix M as 
the sample size T > ©. 

(a) Suggest a procedure for obtaining estimates Q,, @ and a3 of the 
parameters in (5.11.1) that satisfy the restriction (5.11.2) and are such 
that the,/7 (4; — a;) have limiting normal distributions whose variances 
are at least as small as those of the limiting distributions of the,/T (a — a;), 
where the a; are the ordinary least squares estimates of the q;. 

(b) Compare the variances of the limiting distributions of \/T(@, — a1) 
and./T (aj —a,) when a, = 2,a, =4,a3 = 1 and 
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Question 5.12 


The unobservable non-random variables yf and xf (t =1,..., T) satisfy 
exactly the relation 


yt = Bre 
where 6 is an unknown parameter. The observable random quantities y; 
and x,(t =1,..., 7) are known to be related to yf ‘and xf respectively 
but involve measurement errors, so that 

NV; = yp 7 U; and “xp = x, + Uy 
where u; and v;(t = 1,..., 7) are random errors for which 

E(u;) = E(u.) = 0 (fo TP) 

\ 

E(u?) = E(v?) = o? (f= 515.4 .552) 

and 


E(u,u,) = 0 (t # s) 
Show that the estimator of.6 defined by 


ih ap 
OL irl oat 
t=1 t=1 
tends in probability to B as 
ar 
merece 
t=1 


tends to infinity while T and o? remain fixed. Comment on the significance 
of this result. 


Question 5.13 
An investigator wishes to estimate the model 
y = XB+Wytu (5553 1) 


where y is a J x 1 vector of observations on the endogenous variable, X is 
a matrix of observations on k non-random exogenous variables, W is a 
vector of values of another exogenous variable which is non-random but 
also unobservable and u = (u;) is a vector of random disturbances. 

It is proposed to estimate the parameter vector B in (5.13.1) by two 
different methods: 
(a) by omitting the unobservable variable W in (5.13.1) and regressing + 
on X to obtain £, 
(b) by introducing an observable proxy variable P for W and regressing y 
on X and P to obtain B, the vector of estimated coefficients of the k 
variables in X. 
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It is assumed that P is related to W by the equation 
P= We +0 


where @ is an unknown parameter and v = (v;) is a vector of disturbances. 
It is further assumed that wu and v are independent and 


P(e) = 8E (03) = 0! @ Sadly AL) 


f ns ; 
I xX ww ex 
Too T TT © TT? © Eb 


are all finite and in addition Myw > 0 and Mxx is positive definite, show 
that 


plim (6 —B) = Mxk MxwyY 


Too 
and 
2 
lim (6 —8) = Pe ee 
a aa O? Mywlcoee) 4 Gy xw 


where Rix is the limit in probability of the coefficient of determination 
in the auxiliary regression of W on X. 
What conclusion can you draw from the large sample bias of 6 and 6? 


(Reference: Wickens, 1972; see also McCallum, 1972 and Aigner, 1975). 


Question 5.14 
In the system 

Yur = By Xyt + Ut 

Yor = BoXae + ure 
iy lo. y hy a {375 he 6 1, se. 2 ely ate true values Of 
the same economic variables (Y and X) for two different microeconomic 
units 1 and 2, respectively. The u;,;(2 = 1,2;t =1,...,T7) are random 
disturbances and the B;(7 = 1, 2) are unknown parameters. Both X and Y 


are measured with error and the measured values of these variables are, 
for the two units, 


Wipe 2 ap Wi, Xue = Xy+Yy (CAS ferent, ) 


and 

Yor = Vor —We Xo = Xoe — Oz (e Paply shoe 1) 
where w;(t = 1,..., 7) and u,(t =1,..., 7) are random measurement 
errors. 


é 
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It is assumed that the X;, are non-random and that the second moment 
matrix 


tends, as T > ©, to the positive definite matrix 


is al 
Ma, M22 
\ 

The disturbances uz are serially independent and for each value of t have 
the same distribution with E(uz) = 0(¢ = 1, 2), E(u?,) = 07, E(uyua) = 
042 and E(u3,) = 03. The measurement errors w; and 2; are also serially 
independent and for each value of t have the same distribution with E (w;) 
= 0, E(w?) = 03,, E(v,) = 0 and E(v?) = of, respectively. It is further 
assumed that w, and v; are mutually independent and independent of the 
Uit - : 

If b; is the coefficient in the regression of the y;, on the x; (¢ = 1, 2) 
and b is the coefficient in the regression of the aggregate variables y,,; + 
Yo, on the aggregate variables x,, + x verify that 


: Bimii : 
lim 6, = = >a bee aly 
Pes Mii + o2 ( ) 
and . 
pin Ove Wye (4) By 
where 


a! mM 44 + my 
m4 ae M2 ar 2m12 


(a) Do these results suggest that macro-equation regressions offer any 
advantages over micro-equation regressions when the micro-variables are 
subject to measurement error? 

(b) Comment on any simplifications in the structure of the model that 
you may feel are unrealistic. 


(Reference for an extensive analysis of this type of model: Aigner and 
Goldfeld, 1974) 


Question 5.15 \ 


For the equation 
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y = XB+u (5415 .1,) 
where E(X'u) = 0 and u is N(0, 07/7), the set of available observations on 
X is the T x k matrix X* with 
De ey es (5:15.2) 
E(V) =0, T'E(VV’) = Q which is positive definite, E(X'V) = 0 and 
E(V'u) =0. 
(a) Show that the regression of y on X* yields an inconsistent estimator of 


B 
(b) It is known that the X matrix satisfies 
X = Zr' 


where the T x n matrix Z is observable, (T >n >k) andmisakxn 
matrix of unknown coefficients. Show that a regression of X* on Z yields 
a consistent estimator of m and hence obtain a consistent estimator {2 of 
$2. Derive an instrumental variable estimator 6 of 6 and if limy-_,..(Z'Z/T) 
exists and is non-singular, prove that B is consistent for B. 

(c) Show that 


BIS. Sane re VICE 
aeons of 
and hence prove that 
5 Ser Ke al eee ey 


yields a consistent estimator of £. 


ja+9 


(University of London BSc (Econ) examinations, 1977.) 


Question 5.16 
Consider the data generation process given by 
Via Bark thy (t = ee cal) (5.16.1) 
Ke = YZ H Ure (5.16.2) 
where E (x,uy,) = a #0, E(z,u4,) = E (zeuy) = 0, the z, are independent 
N(0, 5), the uj, are independent N(0, o,;) for all ¢ and E(uypuy,) = a. 
(a) Derive the following population moments as functions of the 
parameters (a, B, Y, 5, 041, 022) 
E(z?), E(xt), E(x:2t), 
E(y?), E(yexe) and E( yee). 
(b) Describe appropriate functions of these six population moments which 
are equal to each of the six parameters respectively. Explain how 


consistent estimators of these six parameters can be obtained using sample 
data. 
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(c) Let xf =x, —kz;,, where k = E(x?)/E(x;,z;) = plim (Dx?/Dx,z;), then 
show that & = (1/T) Uxfy, = (1/T) Uxf Ut is a consistent estimator of a. 
(d) Derive the limiting distribution of \/T & when a = 0. [Note: E (x4) 
=O0ifa=0]. 

(University of London BSc (Econ) examinations, 1977.) 


\ 


3. SOLUTIONS 


Solution 5.1 


The given model (5.1.1) is a non-linear regression model with additive 
disturbances and the estimator a(S) is known as a minimum distance 
estimator of a. The asymptotic theory of regression in this type of model 
is developed in Malinvaud (1970a) and Jennrich (1969) and an excellent 
general treatment of the problem of statistical inference in non-linear 
regression models is given in Chapter 9 of Malinvaud (1970b). More 
recently, this type of model has been the subject of further discussion in 
Gallant (1975a and 1975b), Phillips (1976) and Barnett (1976). Our own 
discussion in this solution will touch on a number of aspects of the 
problem considered in these references but the reader is urged to consult 
these references for a complete discussion. It will be assumed that the 
reader is familiar with some of the basic concepts in modern analysis (such 
as neighbourhoods, closed sets, compactness and the notion of an 
infimum and supremum) and for further reference here the books by 
Rudin (1964) and Dieudonne (1969) are recommended. 

Malinvaud (1970b) gives a simple and direct result in which az (S) is 
consistent under the following condition: 


Condition 5.1.A For every closed set w which does.not contain a® 


(i) P[inf Qp(a) = 0] tends to zero as T >; and 
acw 


ap 


(ii) P| sup B | Oe) ate y [ge (a) — g;(a°)] Su > | tends to zero 


acw 


as T~ ~, where Qr (a) a)" S[ge(a) — g¢(a°)). 


I 
Hh4s8 
GQ 
S 
| 


Despite its generality, Malinvaud’s result that a(S) is consistent under 
condition 5.1.A is not very helpful as it stands. For it does not spell out 
precise conditions on the functions g;(«) and the disturbances u, which 
will ensure the consistency of ap (S$). Such conditions are however, given 
in the articles by Malinvaud (1970a) and Jennrich (1969), although the 
latter deals only with the scalar model. 

If we wish to make our conditions on the model more explicit than 
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Condition 5.1.A above then we must first detail our assumptions about 
the disturbances u; and the possible domain of the parameter vector a. 
Our remaining conditions then concern the systematic component g; (a). 
We note that the elements of the vector function g;(a) are functions of 
the exogenous variable vector x; as well as a so that any conditions on 
8+(Q) will also imply some conditions on the exogenous variables. We can 
now adopt an indirect approach involving conditions on the sequence of 
functions g;(@) or a direct approach detailing sufficient conditions on the 
exogenous variable sequence x;. Malinvaud (1970a) uses both approaches 
and this is one of the reasons why his article is so valuable. Jennrich 
(1969) uses the indirect approach and his elegant treatment of the problem 
has formed the basis of much later work (Hannan, 1971, Robinson, 1972, 
Phillips, 1976). As might be expected the indirect approach leads to 
simpler and more general conditions on the model. 

Using the indirect approach, we note from Phillips (1976), that the 
following two conditions are sufficient for the consistency of a(S) in the 
present case: 


Condition 5.1.B (i) a° lies in a compact set ® in p-dimensional Euclidean 
space. 

(ti) The disturbance vectors {u,;: t = 1, 2,... }are stochastically 
independent and identically distributed with zero mean and positive 
definite covariance matrix QQ. 

(11) The elements of g;(_) are continuous functions on ®. 


Condition 5.1.C 


(i) lim = y £1(Q) g;(B)' exists and the convergence is uniform 
Too 


forall a, BE®. 


(i) lim 7 Leela) —gx(0®)] [er(a) — gr(a®)] i positive definite 


for alla#o® in ®. 
Conditions 5.1.B and 5.1.C are, in fact, sufficient to establish that 
P[ lim ap(S) = a] = 1 (Bil) 
Too 


(for a proof along the Jennrich lines see Phillips, 1976). This means that 
the sequence a, (S) converges to x with probability one; and (5.1.2) is a 
stronger result than 

plim ap(S) = a (5.1.3) 


TO © 


Remark The concept behind convergence with probability one (or almost 
sure convergence as it is sometimes called) is discussed, for instance, in 


” é 
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Rao (1973, p. 110). The fact that this type of convergence implies 
convergence in probability [i.e. that (5.1.2) implies (5.1.3) in the present 
case] is demonstrated in the same reference. As Rao points out, the 
concept that underlies (5.1.2) is more profound than that underlying 
(5.1.3). For (5.1.2) tells us that those sequences {ap (S)} which do not 
converge to @° (in the ordinary sense) have zero probability in the space 
of all realisations {y,, y2,...}of the y; process. \ 

Turning to discuss Conditions 5.1.B and 5.1.C we see that 5.1.B is fairly 
conventional: 5.1.B (ii) is a classical assumption on the errors in the 
model; whereas 5.1.B (i) and 5.1.B (iii) ensure that the estimator a7 (S) 
exists and is a properly defined random vector (Lemma 2 in Jennrich, 
1969). Condition 5.1.C involves the sequence of functions g;(q@) and, 
therefore, implicitly imposes conditions on the components of the model 
which make up the systematic part g;(a@). Moreover, part (11) of Condition 
5.1.C involves the identifiability of a in {g,(a°);t = 1, 2,... }.We can 
illustrate the implications of 5.1.C (i) and 5.1.C (ii) by taking the simple 
model in which 


(0°) = A(a®) x; (5.1.4) 


and A(°) is an n x m matrix See elements are continuous functions of 
the more basic set of parameters w°. Then, the following new condition is 
sufficient to ensure that 5.1.C (i) holds. 


Condition 5.1.D 


2m 


slo — ; vig Oss 
lim T > xix, = M,, exists and is positive definite. 
t=1 


Turning to 5.1.C (ii), we note that under (5.1.4) 


lim = D [ee(@) ~ge(9°)] [gee — ge(0°)]' 


T- 0 


~ [A(a) — A(a?)] M,. [A(@) — A(a°)]’ (5.1.5) 


Now, when Condon 5.1.C (ii) holds, the matrix (5.1.5) is a zero matrix 
only when a = a°. This means that if M,. is non-singular (as it is under 
Condition 5.1.D) then the equation 


A(a) — A(a®) = 0 


implies that a = a. ee other words, the equation shun ) = A(a®) has the 
unique solution a = «°; and the parameter vector a® is identifiable in the 
coefficient matrix A. 

Details of direct conditions on the exogenous variables which, together 
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with Condition 5.1.B, are sufficient for the consistency of ap (S$) are 
discussed in Malinvaud (1970a) and Gallant (1975). The main ideas 
behind these conditions are outlined in Malinvaud (1970b) p. 331. 


Final Remarks 
(a) Frequently we will be interested in estimators of the same type as 
a; (S) but which minimise 


2 [ve —g2(@)]' Sr [ye — 8¢(@)] 


where the matrix Sp is positive definite and dependent on T in such a way 
that Sp converges to a positive definite matrix S as T> ©. If Sp > S with 
probability one, then we have the same result for ap (S7) as for a(S) 
when Conditions 5.1.B and 5.1.C are satisfied (c.f. Phillips, 1976). 

(b) Condition 5.1.C (ii) is stronger than is really necessary. To see this 

we need only consider the following two-equation model: 


Wap ONee ctl i, (5.1.6) 


—~ 
Or 
= 


aie 
Yor = AX + Uy 
5.1.C (ii) is now not satisfied because the matrix 


rT. 


lim = Y [gs(a) —g(0°)]' (era) — ge (0°)] 


To © If f=4 


is in this case (under 5.1.D) just 


fates Onerse 02 
ml = [OrBenies cH Or] 
where m = limp... TY! LZ, x?,. This matrix has rank unity and is not 
positive definite. On the other hand, it is a zero matrix only when a = a 
and the estimator a7 (S) is certainly consistent [note also that an ordinary 
least squares regression on (5.1.6) will produce a consistent estimator but 
will neglect the information about the parameter a that is contained in 
(5.1.7)]. 

We can, in fact, replace condition 5.1.C (ii) by the alternative weaker 
condition 5.1.C (ii)*: 


i 
oR has X [ge(%) — ge(@°)] [ge(@) — 84 (@°)]' 
is positive semi-definite for all a # a in ® and is a zero matrix only when 
a= a. 
The result we gave earlier in (5.1.2) still holds. 


é 
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Solution 5.2 


Part (a) Since u,, u2,..., up are independent and normally distributed, 
their joint probability densisty is given by 


T 
: (-3 bs i'w) (5.2.1) 


(21)"T’? [det (Q)]7” a m, 


\ 
From (5.2.1) we derive the joint density of the endogenous variables 
V1,V2,---,yr first by replacing u; in (5.2.1) by its definition in terms of 


y,[i-e. up = yz —g;(@°)] ;in Solution 5.1 we used the superscript in a to 
emphasise the true position of @ in the parameter space. In most cases, 
this will not be necessary and we will often drop the superscript. We then 
multiply (5.2.1) by the Jacobian of the transformation of the u; into the 
y;- Since 


0 0 
a =f, and ne = 30 (t #s) 
yt Vs 
the Jacobian is unity and the joint density of y,,..., yp is 
y J yoy y 
1 
(Qq)"T/2 [det (Q)]7/2 cXP DD [ye — ()]’ SN a £1 ()] 


(5.2.2) 


For given data on the variables, (5.2.2) considered as a function of the 
parameters contained in @ and {2 is called the likelihood function. The 
maximum likelihood estimators of a and Q are then obtained by 
maximising this function (or, equivalently, its logarithm) with respect to 
a and $2. We write the logarithm of the likelihood function as 


ING, 0) p= ee toe >in [det (Q)] 


mae L [ye —gr(@)]' 27" [ye — gt (@)] 
In (277) —F Infdet(2 )] — 5 r(0'M) 


where 


M = (IT) [ye —e(@)] [ye —ee(0d’ 


Thus maximising L(q, §2) is equivalent to minimising 
L(a, 2) = In[det(Q)] + tr(Q™M) (5.223) 
We can do this sequentially by first fixing « and minimising with respect 


to 92. The resulting value of Q will then itself be dependent on @ and can 
be substituted back into L to give a function concentrated in terms of a. 
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This process is called concentrating the likelihood function and the 
justification of this stepwise procedure can be found in Koopmans and 
Hood (1953, pp. 156-158). 

Differentiating (5.2.3) with respect to the elements w;; of Q we obtain 


ament irae ee PARI ON &, 
re (2 sae (2 Ia, 2 a] (5.2.4) 


Remark The right hand side of (5.2.4) is obtained by using the following 
two rules from matrix calculus: 

Let A = A(A) be a square non-singular matrix of order n whose 
elements are differentiable functions of a scalar \. Then 


() S47 Q)] = 4" wee Maa 


ey, ik ae OAK) 
(ii) aD In [det A(A)] ula (A) Dr a) 
Rule (i) is established, for instance, in Malinvaud (1970b, pp. 196—197). 
There are a number of alternative forms of Rule (ii) in popular use 

(c.f. Fisk, 1967, pp. 147—148) but since these can cause confusion when 
A is symmetric as it is here, we outline the derivation of Rule (ii) in 
Appendix B (the reader is also referred to the helpful remarks in Theil, 
1971, p. 32). Returning now to (5.2.4) we have 


assy les 2] <er/ Ze. o-1ua-'} 
00); OW; 


~ oe (Q7! -9-1N9*)| 
(69) 


ij 
From the symmetry of 92 we note that when 7 #7 


1 iff =1, s=]7 


(22 =( 1 iff =j, s=1 
: 0 otherwise 
so that 
aL = = el al Q7 —Q-7?oMQ7 
Bunain (Q76 — QE MQ™);: + ( MQ™ )is 
ij 


é 
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=62(QTA- = OQILMNSD;} (5.275) 
because of the symmetry of 27! and M. When z = 7 we have 


an 1 ifr =s =1 
0Wi; Ly te 0 otherwise 


so that 
OL 
Wij 


= (277 —27°MQ");; (5.2.6) 
Setting the first order derivatives of L equal to zero, it follows from 
(5.2.5) and (5.2.6) that . 
O70 = OM” 
and hence the matrix 
Q = M 
satisfies the first order conditions for a minimumof L. The fact that L is 


a minimum for this value of Q is proved by Malinvaud (1970b, p. 339). 
Substituting Q = M into (5.2.3) we obtain the concentrated function 


L*(a) = In[det(M)] +n (5:25) 


The maximum likelihood estimator of a is now obtained by minimising 
(5.2.7) with respect to @ and it is clear that the same result is obtained by 
minimising In[det(M)], which we were required to prove. 


Part (b) If & is the maximum likelihood estimator of a then from Part (a) 
we know that & minimises In[det(M)] ; and the maximum likelihood 
estimator of {2 is 


= == [ye — g2(&)] [ye — 82 (@)]’. (5.2.8) 


By definition the pair (a, Q) maximises L (a, §2), the logarithm of the 
likelihood function. Moreover, & maximises L (a, &2) when Q = Q. From 
the form of L (a, 92) it then follows that & minimises 


i—h4s8 


[ye —g+(@)]' 27 [ye — g(a] (5.2.9) 


t=] 


In other words & = a (Q~") in the notation of question 5.1. Hence, & can 
be regarded as a minimum distance estimator of @ in which the distance or 
metric is defined by the matrix @: 

However, this representation of & is not very cere in determining the 
properties ar a; and, in particular, in verifying that & is consistent. For, Q 
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itself depends on @ and, thus, in the absence of an independent theory 
which explains the behaviour of & when T becomes large we cannot infer 
the asymptotic behaviour of the random matrix Q. The latter is necessary 
if we are to appeal to the consistency of minimum distance estimators 
(recall our Final Remark (a) in Solution 5.1 above). Fortunately, an 
independent asymptotic theory for & is readily available (see Phillips, 
1976); in particular, we know that, under Conditions 5.1.B and 5.1.C 
which we discussed in Solution 5.1, & converges to a almost surely (and 
hence in probability) as T > ©, , 
We can also compare the pair of maximum likelihood estimators (&, 92) 
with the pair of estimators obtained from the following iteration: 
(i) Find the minimum distance estimator a, (S) taking any positive 
definite matrix for S (such as the identity matrix /) 
(ii) Estimate the covariance matrix Q by constructing the second 
moment matrix of the residuals from the regression in (i): 


Pr ili p> {ye — ge [or (S)] } {ye — ge lar (S)] } (5.2.10) 


(iii) Find the new minimum distance estimator 
Or (Min') (5.250) 


(iv) Return to (ii) and continue the iteration from (ii) to (iii) and back 
again to (ii) until the procedure has converged (that is, until 
successive estimates of @ are numerically the same at a given level 
of tolerance). 

The above procedure is known as the iterated minimum distance (or 
iterated generalised least squares) procedure. It was suggested in 
Malinvaud (1970, pp. 337—338) and has more recently been considered 
by Phillips (1976) and Barnett (1976). Malinvaud suggested that we can 
expect the iteration in (iv) to be convergent and also observed that, upon 
convergence, the estimators from this procedure share the same property 
of interdependence exhibited by the pair of maximum likelihood 
estimators (@, 92). Indeed, if we denote the estimators that emerge from 
the iteration in (iv) by (a**, 922**) we have from (5.2.10) that 


as 
La ‘ecu et 2: ie 3 i) ee Cte 
and y 
a*t* — ag.(See =") 


from (5.2.11). These considerations suggest that the iterated minimum 
distance procedure (i)—(iv) may well provide a convenient route to the 
estimators (&, 2). But, for this to be so it is important that the iteration 
in (iv) be convergent and that the point of convergence a*™ yields a global 
maximum of the concentrated likelihood function (or, equivalently, a 
global minimum of L* (a) defined in (5.2.7) above). Regularity conditions 


¢ é 
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under which this is indeed the case are given in the article by Phillips 
(1976). Moreover, when the procedure is convergent and this convergence 
holds uniformly in T (note that to obtain a** the iteration in (iv) is 
needed for every sample size T) Barnett (1976) has shown that the 
consistency of the maximum likelihood estimator can be deduced from 
the consistency of the minimum distance estimators of a obtained at each 
stage of the iteration in (iv). . 


Remark We have in this solution concentrated on the consistency 
properties of maximum likelihood and minimum distance estimators. We 
will consider other asymptotic properties in our next question. 


Question 5.3 


Part (a) The asymptotic sampling properties of ap (S7) or more precisely 
the limiting distribution of \/T [ap (S;) — a] are discussed by Malinvaud 
(1970b, pp. 331—336). In the first place, we require a7 (S;) to be a 
consistent estimator of a°. To this end we can assume that Conditions 
5.1.B and 5.1.C of Solution 5.1 hold. We remark‘n passing that if for 
some reason @7(S7) is not a consistent estimator of a [for instance, there 
may be a specification error in the systematic component g;(a)] then we 
may still be able to discuss the asymptotic sampling properties of a, (S7) 
but we will need more information before doing so (such as the true 
specification of the model if g;(a) is misspecified) and we may need to 
impose stronger conditions on the disturbances (such as the existence of 
fourth order moments — compare Solution 6.10 below). 

Following Malinvaud (1970b), we construct a matrix z; whose ¢, jth 
element is 


Zit = Ogiz(a°)/ 90, 
and we let 
M,7(S) = 


1 aL 
F yi TiS lee 


for any positive definite matrix $. We now impose two further conditions: 


Condition 5. 2 .A The parameter space ® contains a neighbourhood V® of 
the vector a of the true parameter values. 


Condition 5.3.B 
(i) In the neighbourhood V° of x the functions g;,(a) and their first 
three derivatives are uniformly bounded. 
(ii) For any positive definite matrix S the matrix M7(S) is positive 
definite and has a positive definite limit M(S) as T > . 


Under conditions 5.1.B, 5.1.C, 5.3.A and 5.3.B and given that the 
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matrix $7 tends in probability to the positive definite matrix $, Malinvaud 
(1970b, pp. 334—335) proves that the vector 


VT [ar (Sr) — a] 
has a limiting normal distribution as T > co, The mean of this limiting 
distribution is the zero vector, and the covariance matrix is 


[M(S)]“! M(SQS) [M(S)]}?! (5.3.1) 


Part (b) When S = 22"! (so that Sp converges in probability to Q7!) we 
see that (5.3.1) reduces to 


[M(Q™)]" 
To prove that ap(S7) is asymptotically efficient in the class of minimum 


distance estimators when Sp tends to 2"! in probability we must show 
that 


[M(S)]" M(SQS) [M(S)]? — [M(Q*)]" (5.3.2) 


is a positive semi-definite matrix. To do this we first define T matrices 
{A4,:T=1,..., T}by the equations 


Ul 1 if = , =< 
ee y 2:82.) ZS “(72 Z.2""7) ZQ7+4A4, 


i harem 
teere Vie syd) 


When a has p components, A, has dimension p x n for all 7. We note that 
T 

HAZ, <=10 (2.3.3) 
and also that 


( y zist,) Ae Ey) 


T ¢=1 


1 iy 
=i dX Z,X2 “zy Zl 


be =I 
(oe zi0-"2, Z!Q719A! 


ste(7 x ZSZ " 


gee ES 
912, (F eee OENZ, 
T t=1 


2 


ws 


1 
+A,2Q07! AG 


-1 
Eis 2) ASA. (5.3.4) 
Hence, summing (5.3.4) over 7 =1,..., 7, dividing through by 7, and 
using (5.3.3) we obtain 


é 
‘ ¢ . 
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[My (S)}"! Mp (S&S) [Mp(S)J1 = [Mp(Q-HP +E AA 


We now let K be a non-singular matrix for which Q = KK’. (This is 
possible since {2 is positive definite). Then for any non-zero vector d with 
p components we note that 


pes Ie 
di) A QAy | d= = Aa Ae 
T ¢=1 E 


where b, = K'A;d. But 


where b; denotes the 7th component of b;. It follows that 
il fh 
al > aia.) d>0 
Tags 


for any non-zero vector d and hence 
[Mr (S)]7! Mz (SQS) [Mp (S)]J! — [Mz (Q7)]? 
is positive semi-definite. This holds for all finite 7 and therefore, 
lim d' {{Mp(S)]"? Mp (SS) (Mz (S)]7! — [Mp(Q™) "1 }.d > 0. 
re tt (5.3.5) 


That is, the limit of a sequence of non-negative numbers must be non- 
negative, if that limit is known to exist. Here, the limit exists since My (S) 
and M7 (827!) are known to have non singular limits by 5.3.B (ii). We can 
write (5.3.5) as 


d’ [M(S)|“! M(SQS) [M(S)]"! d—d' [M(Q*)]"1 d>0 


so that the matrix (5.3.2) is positive semi-definite. It follows that ap (S7) 
- 1s asymptotically efficient in the minimum distance class when S7 > Q7! 
in probability. 


Remark The above use of the term asymptotically efficient is a little 
different from that in Solution 3.7. Here, we consider’a very specific class 
of estimators with which we are comparing ap (S7), where S; tends in 
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probability to Q7! (that is, the class of minimum distance estimators 

Q7 (S7) for which Sp tends in probability to an arbitrary positive definite 
matrix S$). If we assume, however, that the disturbances u; in the model 
(5.1.1) are normally distributed, then the estimator a(S), where 

Sp > Q* in probability, is asymptotically efficient in a more general 
class of estimators. It is, indeed, best asymptotically normal in the sense 
(and with the associated limitations of the definition) discussed in 
Solution 3.7. We need only verify that the asymptotic covariance matrix 


[M(Q°*)}* 


attains the limit of the Cramer-Rao (matrix) lower bound. That 
this is so is shown in Malinvaud (1970b, pp. 340—341). 


Solution 5.4 


Part (a) The model (5.4.1) is a single equation example of the general 
constrained linear model 


Vt = A(a°) Xt As Ut (t = iB SAA T) (5.4.3) 


where y; is a vector of n endogenous variables, x; is a vector of m 
exogenous variables and A(a®) is a parameter matrix whose elements are 
Hulestys of the basic parameter vector a, whose true value is denoted by 
Os. 

The model (5.4.3) is discussed in detail by Malinvaud (1970b, pp. 
348—360). In particular, Malinvaud proves in his Theorem 3 on page 350 
that, if Sp is a positive definite matrix which tends in probability to a 
positive definite matrix S, then the estimator a7 (S7) which minimises 


¥ [ye — A (axe) Sp bye A (a) xe] (5.4.4) 


t=1 
is a consistent estimator of a provided the following conditions hold: 


Condition 5.4.A (Malinvaud’s Assumption | on page 331) The disturbance 
vectors u,(t =1,..., T) are independently and identically distributed 
with zero mean vector and non-singular covariance matrix Q. 


Condition 5.4.B (Malinvaud’s Assumption 4 on page 349) The exogenous 
variable vectors x,(t = 1,..., T) are non-random bounded quantities for 
which the matrix (1/T) Xf=; x,x; ts non-singular and tends to anon 
singular limit as T> ©. 


Condition 5.4.C (Malinvaud’s Assumption 5 on page 349) A° = A(a°) and 
Ap is a sequence of vectors for which A(ap) converges to A° then a 
converges to a. 


We see from the assumption made about the u; and the xj in (5.4.1) 


é 
’ ¢ A 
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that Conditions 5.4.A and 5.4.B are both satisfied in the present case. 
Condition 5.4.C is essentially concerned with the identifiability of the 
true vector & in the matrix A° = A(a®°). For, let us suppose there was a 
vector a* # a for which A(a*) = A°. The vectors a* and a° would 
then be indistinguishable in the true coefficient matrix A° and we would 
not be able to identify @® as the true vector. To show that this contradicts 
Condition 5.4.C we need only select the sequence ap = a*(T = 1, 2,...,). 
For, A(a) then equals A® for all T and A(a@,) converges to A® as T> ©. 
But, since ap > a* as T> © and a* #a°® this contradicts Condition 5.4.C. 
Hence, if 5.4.C is to be satisfied it is necessary that w® is identifiable in the 
true coefficient matrix A®. 

In the present case A(q) is the vector 


A(a) = (a, e%) \ (5.425) 
so that if A(a,) converges to A° = A(a°) it follows that 

Ap >a° and expa; >expa® 
Hence, Condition 5.4.C is satisfied in this case; and, therefore, Wis a 
consistent estimator of a in (5.4.1). 
Part (b) In the new model (5.4.2) the vector x; of (5.4.3) is now 

x, = (1,1¢) 


and therefore 
ect dL ele ol claels | 
t= 


1 a F(T ion 


LEE Eee he een 6 
Although M,.,, is non-singular for fixed T it is.clear that this matrix does 
not converge to a finite matrix as T > c, Thus, Condition 5.4.B is not 
sAtisfied and if we are to establish that the estimator @ is consistent, we 
cannot rely directly on Malinvaud’s Theorem 3 on page 350, as we did in 
Part (a). However, what we can do is appeal directly to Condition, DalsAs 
which as we have seen in Solution 5.1 is sufficient to ensure that @ is 
consistent. 

First of all we define 


lI 
Ms 


Qr («) (a + (exp a)t —a° — (exp a°)t)? 


os 
i] 
_— 


ll 
ne 


t=1 


11)? 
[w—a°, exp a—expa®] | || 
t 
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ic 1 ae 
2 [a —a? exp a —exp aI] | pi] ae | 
t=1 t exp ~a— exp a? 


[a — a&°, exp a— exp a] 


i Ee a) a—ae 
, T(T +1) 87(T+1) (27+ 4 Es a — exp 0049 
and also 
1 7 
Vr(a) = On (a) d [a + (exp a)t — a° — (exp a°)t] u; 
1 3 be 
~ Or; (a) [a —a®, expa—expa*] z i (5.4.7) 


+ 
i] 
_ 


Then @ will be consistent if we can verify that 
(i) P[inf Qr(a) = 0] >0 (as T > 00) 
@eow 


and 
(ii) P[sup Vp(a) = 4] 70 (as T > 00) 
aew 


where w is any closed set which does not contain a (Malinvaud, 1970b, 
Lemma on p. 330). 
To verify (i) we first note the representation of Q7 (a) as a quadratic 
form in (5.4.6). This gives us the inequality 
here 8 
Qr(a) = Ap [a— a, exp a—exp a] 


| (5.4.8) 


exp @— exp a? 
where A 7 is the smaller eigenvalue of the matrix 


ig AT(T +1) 


(5.4.9) 
A7(T+1) 47(7+1) (2T+1) 


Nr = 


Remark The inequality (5.4.8) follows from the fact that if x is an 
n-vector and A is a symmetric n x n matrix then 


x'Ax 2 2X,x« x for all x 
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where A,, is the smallest eigenvalue of A (see, for instance, Rao, 1973, 
p. 62). It now follows from (5.4.8) that 


inf Qr(a) > Ar inf (a0) (exp 0 exp & )a | 


and, since w is closed and does not contain a, we know that 


ints (X=)? Ste 0 \ 
QEW 


where € is a small positive quantity. Then 
P[ inf Qr(a) = 0] S P(Ape = 0) = P(Ap = O) 
AEW 
But Ay > 0 for all T > 1 since the matrix (5.4.9) is positive definite for 
i > ToHence } 
P| inf Qr;(a) = 0J = 0 ee Pe 


and (i) is verified. 
We now turn to (ii) and first of all introduce the matrix 


T1/2 0 
Le if nda 


Using (5.4.6) and (5.4.9) we write Q7 (a) as 


Or (a) = [a=—al, exp a— expo] D, DN; Dr De 


ORs a 
exp @— exp a? 
But 
1 uC YT 


DeNgD, = = K 
EE SEEN Brgy Ried Sr rr pe Repenyi a 


say, so that 


Qr (a) — eee: (GC =a). T 3/2 (exp a— exp a°)] KG. 


ea (a <= a) 
1? (exp.a—sexp a 
and 
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Qr (a) = up [T'? (a—a°), T?? (exp a— exp a°)] 
T 1/2 (a = a?) 


5.4.10 
T?* (exp a — expia®) ( ) 


where Uy is the smaller eigenvalue of K;. Thus, from (5.4.7) and (5.4.10) 
we have 


1 
Bel "rT (exp @— exp.we)e | 


2 


t=1 
TF 
2 tut 


el [7 (a — of sd 2lexpuveveRpia®)] III (TSU APRS ua Ty Ue ies) || 

ir (TO — a)? + T? (exp a —expa°)7] (5.4.11) 
where by |la|| for some vector a = (a;), x 1 we mean the Euclidean distance 
(a'a)'/*. [The last inequality (5.4.11) above is then obtained by Cauchy’s 
inequality (Hardy, 1952, p. 34): |b’a| < (b'b)”? (a’a)'”?)]. From (5.4.11) 
we now obtain 


[Vr(a)| < 


[a—a®, exp a— exp a] Dr D7! 


Vn (al < re area Li enti) bee 
a tp [T(a@ — &°)? + T3 (exp a — exp a®)?] 1”? 
me (7 pein): oe (T pat i tu,)? ES 
Mr [(« — a)? + T? (exp a — exp a)? | 
so that a 
shag 272 | Bagae? Rae, 

sup PTO ec se cs) ld pms) 

aew Ure 


where € is now defined by 

erwin (a—a | 

aew 
It follows that 
T p 2 
P[ sup V7 (a) > 4] <P|(r- » " ota ee ee tu Se | 
aecw t=] 

which, by Tchebycheff’s theorem (Cramer, 1946, p. 182; see also solution 
' 2.15), is less than 

E[(T71 2 fey ue)? + (1? Zia tue)? ] 

upe?/4 
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POUT VEE A (Oe oe) 
wpe? /4 
=1) 2 1 7-3 al Je 4) 2 
WE a AAT * (Tos) (2 ieee (5.4.12) 
4 ppe? 


: ae S peat 
Now Ur is the smaller characteristic root of Kz and from the definition 
of Ky it is clear that Ky is positive definite for T > 1 and tends to the 
positive definite limit 


4) | 


as T > 0°, Hence wr is positive for all T and has a positive limit as T > °. 


It follows that (5.4.12) tends to zero as T > © and, therefore, 


P(sup Vp(x) >4)>0 


QEW 


as T > ©, This verifies (ii). Since (i) and (ii) both hold in the present case, 
& is consistent by Malinvaud’s Lemma. 


Solution 5.5 


Part (a) We write 
Ye ~A(Q)x, = yy —A*x, + [A* —A(Q)] x 


so that 


ie y Lye — A (aw) xe] (M3, )7! [y¢ — A(@) xe] 
= TY! (y, —A*x,)'(M3,) 7 (2 — Ae) 


T 
ae Al Ratio ack Ge AY x) (Me) A — A(a)] x; 


t=] 


+77 xh [A*—A(a)] (ME) [4* <A(Q)] xp. 


¢=1 
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Now, since the trace of a scalar equals the scalar itself, we have 
T 
Pee ly, — A*x¢)'(Miu) (ve — A*xz) 


Pe TD tes — Axe) (Mid (y, —A* x.) 


t=1 


ed oe hed ye eye 8) | 


t=1 


tr ons.y]7- per <A Mah Ve - A*x,)| 


= tr[(Miv)'Miv] 
= “ifr 


In a similar way we find that 


Bate > (ye —A*xe)' (Mix )* (A* — A(a)) x 


ZT Mee (Meet Ae Aa) y 


14s 


Tr T 
YL xt(ve —A*xr) = » t= (2 “i Aa 
t=1 


from the definition of A* [see (3.1.2) above]. Finally 
T. 
T* ), x; [A* —A(a)] (Mi) [A* — A) 
t=1 


= tr{[A* —A(a)] (Miu) [A* — A(@)] Max } 


and thus (5.5.1) equals (5.5.2) plus a constant which does not depend on 
a. It follows that the minimisation of (5.2.2) with respect to @ is equivalent 
to the minimisation of (5.5.1). 


Part (b) We make the following assumption in addition to Conditions 
5.4.A, 5.4.B and 5.4.C: 


Condition 5.5.A (Malinvaud’s Assumption 6 on page 349) The set of 
possible values of the parameter vector a contains a neighbourhood of a? 


‘ ¢ é 
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in which the functions a,j(a), where A(a) = a;;(a), have bounded 
derivatives up to the third order. The vector & is not a singular point of 
A(a), which requires that the p x nm matrix 


(vec A(a®))' 
da ; 
have full rank (= p <nm). * 


Remark We refer the reader at this point to Appendix B for a discussion 
of the notation of matrix and vector differentiation that we are using. 

Under Conditions 5.4.A, 5.4.B, 5.4.C and 5.5.A, Malinvaud (1970b 
Theorems 4 and 5 on pages 352 and 355) proves that,\/7T(a** — a) has a 
limiting normal distribution with mean,vector zero and covariance matrix 
which is the inverse of the matrix whose (7,7)th element is 


OA(a’) . 0A(a?) = 
OM (>:D) 
OQ; OQ; 


where M = limy-,..M,,.. But from Appendix A (in particular, property (ii) 
on page 496) we see that (5.5.3) can be written as ~ 


d[vec A(a°)] | (27 ol dvec A (a?) 
0Q; 0Q; 


and thus the covariance matrix of the limiting distribution of,/T (a** — a°) 
is 
| f [vec A(a®)] Jor 9 i) (. seea(e) IK ; 
da 0a’ 


Part (c) In view of Condition 5.5.A we can write A(q@) in a neigbourhood 
of a = a in the form 


where each element of & lies between the corresponding elements of a and 
«°. Since a** is a consistent estimator of x under the stated conditions 
(see Solution 5.4) it follows that 
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0A (a°) 
0Q; 


VT(Ala**)—A(0)] = & 


+e ¥ TAO rage —00)/Taf* — 08) ne 
TT = ii Be4d0, ¥* ‘enh, 


But \/T (a7* — a?) has a limiting normal distribution as T > © (Part (b) 
above) for all 7 and the elements of 
0*A(Q) 


0a; 0a; 


are bounded in probability as T > © (since a** is itself consistent and & 
lies between a** and @° in this case). Thus the last term on the right hand 
side of (5.5.4) tends to zero in probability (see Proposition 4 on page 370 
of Malinvaud, 1970b). 

It now follows that./T[A(a**) — A(a°)] has the same limiting 
distribution as the first term on the right hand side of (5.5.4) (see 
Proposition 5 on page 370 of Malinvaud, 1970b). But this term is a linear 
combination of random quantities [the,/T(a* —a?);7=1,...,p] 
which have limiting normal distributions so that the limiting distribution 
of \/T [A(a**) — A(a®°)] is also normal. Note that we can write (5.5.4) in 
the form 


0 vec A(a°) 
0a; 


/T vec [A (a**) — A(a®)] =} nj (0g — a) +05 (1) 


where 0, (1) denotes a term which tends in probability to zero. More 
simply 
0 vec A(a°) 


| VTter* — a) + 05 (1) 
0a 


J/T vec [A(a**) — A(a®)] = 


and thus,/T7 vec[A (a**) —A(a°)] has a limiting normal distribution with 
mean vector zero and covariance matrix 


vec Ate) (° [vec A(a°)] ) (2-1 @ Mi) (? vec alec [vec A(a°)]' 


0a’ 0a da’ da 
(5.5.5) 


Remark We observe that, in view of Condition 5.5.A, the matrix (5.5.5) 
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has rank p which is less than nm, i.e. (5.5.5) has the same rank as the 
matrix 0 vec A(a°)/da’. This means that the limiting distribution of 
JT vec [A(a**) — A(a®)] is a singular normal distribution (see, for 
instance, Cramer, 1946, p. 312 for a discussion of the singular normal 
distribution); and the covariance matrix of this limiting distribution is 
positive semi-definite. 


Part (d) From (5.5.5) we know that the covariance matrix of the limiting 
distribution of \/T vec[A (a**) — A(a®°)] can be written as 

y** = B'[B(Q7! @ M)B'] 1B 
where 

_ d[vecA(a®)]’ ‘ 
da 

Moreover, the covariance matrix of the limiting distribution of 
JT[A* —A(o°)] is 

v* = QeM (5.5.6) 
(c.f. Malinvaud, 1970b, pages 209 and 225). Thus 

y*—y** = QeM! —B'[B(Q"* eM) BB 
y* om W (Wie neers) [(By*~) Wl Biles 


II 


(By*"") y* 
where 
Coa Ba 
We now consider the partitioned matrix 
=| 0 sal 
Cis ORT 8, 
Boal lye et eee 
OWE voy gue (5.5.8) 


We can write 


vy oy" ee 
Ee valipalone Pe 
be 1 aaa since W* is positive definite (by assumption 
on 92 an an 
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Lal 
is positive semi-definite. (The Kronecker product of two positive 
semi-definite matrices is itself positive semi-definite; c.f. Dhrymes, 1970, 


page 155). It follows from (5.5.8) that ® is also positive semi-definite. 
From (5.5.7) we now have 


i 


y* —y** Tae use (CusG es) @ 


=: (GWEC) Ccy* 


= K®K', say. (5.5.9) 


But ® is positive semi-definite so it follows from the form of (5.5.9) that 
y* — y** is positive semi-definite as required. 


Final Remark We observe that whereas y** is positive semi-definite [see 
the Remark at the end of Part (c)], y* is positive definite. Thus, the 
limiting distribution of the unconstrained estimator A* is non-singular 
while that of the constrained estimator A(a**) is singular. This is as we 
would expect; because in the latter case we are confining the matrix A to 
the subset of nm-dimensional Euclidean space defined by the equations 
A = A(a) where @ is a p-dimensional vector. 


Solution 5.6 


Part (a) The model (5.6.1) is a single equation instance of the constrained 
linear model (5.4.3). The constraint a? =a is simple to parameterise in 
this case, so that we can write (5.6.1) as 


Ve = G4X4p + atxy + uy; (5.6.2) 


and the coefficients in (5.6.2) are now simple functions of the parameter 
a, = a, say. Let us now write (5.6.2) as 


Yr = a(a)'x, + uy (5.6.3) 


and it is clear that we can estimate a by minimising 


La) = 3S (ye alae]? 


Since the model (5.6.1) involves only a single equation we need not 
weight the quadratic form L(«) [compare (5.5.1)] as any such weighting 
leads only to the addition of a positive scalar coefficient to L(a), which 
will not affect the estimate of a we finally obtain. 
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Rather than directly minimising L(a) we can use the following iterative 
procedure: 


(i) estimate a(a) by an unrestricted least squares regression on (5.6.3) to 

obtain the vector of regression coefficients a”; 

(ii) estimate a by minimising [a* — a(a)]' M,,.. [a* —a(a)]. As shown in 

Part (a) of Solution 5.5 the estimate a obtained from (i) and (ii) is the 

same as that obtained by directly minimising L(q). yi 

Once we have found &@ we estimate a, oe ay by a = ay =a’. 

Then, as shown in Solution 5.5,,/T (a; — a,) and,/T(4 (a, — ) have 
limiting normal distributions centred on zero with variances ae are at 
least as small as those of the Ba distributions of \/T(aj —a,) and 


VT (a3 ay). 


Part (b) We now consider the case where 


\ 


‘ i aa 
a, = 0.5,’ a5 = 025 )jose5, hands = 


The covariance matrix of the limiting distribution of \/T(a* — a) is given 
by 


o?M-! = Sintec 
fe (5.6.4) 
= 1 


and that of \/T [a(&) —a] is given by 


da(a®) | dala)’ _, 
da eters da (0 Mae) da 


(5.6.5) 


where 


Od (Cz) aw |) ple soe 
a ~ [ae] * [1 


since w° = 0.5. Thus (5.6.5) becomes 


1 


fen() )G)p an 
= #[] o9 
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Comparing (5.6.4) and (5.6.6) we see that the variance of the limiting 
distribution of \/T (aj —a,) is 2, whereas the variance of the limiting 
distribution of \/T(a4, —a,) is 1/5. 


Remark It is worth noting that (5.6.4) takes the same value regardless of 
the true values of a; and a, whereas (5.6.5) depends explicitly on the 
value of the true parameter a. As a result, the (asymptotic) variance 
reduction that is gained by the use of a(@) rather than a* can vary 
considerably with changes in the value of a. The reader may for example 
like to try the above calculation again with the alternative value 


a° =—0.5 giving a, = —0.5 anda, = 0.25. 


Solution 5.7 


Part (a) We denote the matrix of coefficients in the system (5.7.1.2) by 


a 4110-442 | 
O24 «22 
Then the following three-stage procedure produces asymptotically 
efficient estimates of A (see Malinvaud, 1970b, pp. 355—358): 
(i) Calculate the matrix A* of unrestricted least squares estimates of A 


(ii) Calculate the sample second moment matrix of residuals from the 
regression in (i). That is, calculate 


* 12 * OK) 
Mun ert pa Ut Ut 
U 
where uj = y,—A”Xt, | Sos (Yit» Ve) and x; = (X14, X24). 
(iii) Calculate the matrix A** which minimises 


tr[(A* —A)'(Mi,)71 (4* — A) Myr] (5.7.4) 


subject to the restriction (5.7.1), where M,, = T7! Dfa1 xX}. 
Under the assumptions given in the question, A** is an asymptotically 
efficient estimator of A (see Theorem 6 of Malinvaud, 1970b, p. 356) 


Remark Rather than minimise (5.7.4) subject to the restriction (5.7.3), we 
can reparameterise the model by writing A explicitly in the form 
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thee i va Sergi 


a2 a22 


* 


where a’ = (a4, 221, 492). Then A** = A(a**), where a** is the vector 


which minimises 
tr{{A* —A(a)|" (Mi) -lA” —AGIT ya 
with respect to a. 
Part (b) In view of the above remark the limiting distribution of 


\/T(A** — A) is the same as that of,/T[A(a**) — A] . But the covariance 
matrix of the limiting distribution of \/T vec [A(a**) — A] is given by 


E see ales)) (2 [vec A(a°)] | care 


da’ da 
: (2 eae (2 [vec 4(a°)] ) (5.7.5) 
da da 


(see (5.5.5) above’. In the present case 


and 
Oa oN BN Lela 
Moreover 
‘ 1 —)/az; OQ © 
dvec A(q) 
—— ee = 1.0 0 10 
0a 
0 0) Ok al 


so that (5.7.5) becomes 


Ee 408-0 1 -1 0 1 i -1 -1 
sate Oiyal a) 240 Li, oe 2ierrliwer a2 
(aie Eat 0 0 =e i FO Toa 
O-py 0,91 ey eee 
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OT): ar Oe"0 
rd 7, U, 0 01) egdtinO 
x 
Onl 10 bebe 6 oa 
On «i 
Ma eels ly O end loo are eased) 
A femal Si gine) bl aes Vlei Us a 
=o 
Obese (OLD Pent ay A OnuOes ch 
Cy On at 
cece WU tat 4 2 ate se 0 kU 
Ae |rphes 0500) | Se 282 0 1 0 
On La OG a oe? 2 Or sOmst 
Pe CO 1 
2a 2 | ir | 
ire Cal) wet 1 
= 0? (5.7.4) 


De oe ol 
al | ier 1 
On the other hand the covariance matrix of the limiting distribution of 
\/T vec (A* — A) is given by 
QeMz3 
[see (5.5.6) above] which, in this case, is 
ea ey’ ovis | 
scone ats lod? I (5.7.5) 
fe ied 7 Sat ah 
Sail! 1 al 1 


Subtracting (5.7.4) from (5.7.5) it is easy to see that the resulting matrix 


age Pe 
Bao £0 0930 
0 

0 4) 

0 0 0 0 


254 EXERCISES IN ECONOMETRICS 


is positive semi-definite, as theory suggests (see solution 5.5, Part (c)). 
Taking the diagonal elements of the matrices (5.7.4) ane (5.7.5), we see 
that the variance the limiting distribution of \/T (aj; aj;*" — aj;) is less 
than that of JT (aij aj; — a;;) for the components a4; and a2, . The variances 
are equal in the case of a,2 and a2. 


Remark Since the constraint (5.7.3) involves coefficients of the first 
equation, it is not surprising that the non-linear regression leads to an 
efficiency gain (asymptotically) for one of the coefficients of this 
equation. The fact that there is an asymptotic efficiency gain from the 
non-linear regression in the case of a coefficient of the second equation is 
important and is the result of the inter-equation disturbance correlation. 
The reader may like to try the same question again with 


a= | il 
Oo 


Question 5.8 


Part (a) The given model is a single-equation errors-in-variables model. We 
know that in such models, least squares regression does not, in general, 
provide consistent estimates (see, for instance, Malinvaud, 1970b, pp. 
379—380). Thus it is proposed in the question to use an orthogonal 
regression. 
The perpendicular distance of the point (y;, x;) from the line 
y = at Bx 


is given by 
— a — Bx; 
n/a 


(see, for instance, Brown and Manson, 1950). Thus, the orthogonal 
regression estimates of a and B° in (5. 8.1) are obtained by minimising 
with respect to aw and 6 the sum of squares 


(Ye Tem Bey ee 
2 | vi +e = gal & emt) Okt 


We can now concentrate (5.8.3) as a function of B. Setting to zero the first 
derivative of (5.8.4) with respect to a we get 
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i Meiosis» 
1 + 6? =p (ve One) =") 
so that 
a = ¥—Bx (5.8.5) 


where 
T 
y= Lire yy, andy ke= dc Soe, 


when (5.8.3) attains its minimum. Next we concentrate (5.8.4) as a 
function of 6 by substituting (5.8.5) into (5.8.4). We obtain as the new 
expression to be minimised 


2 


ki ae 2 ce =H) 7B (we X) 


=1 


e i; : A (rryy — 2Brriyy + B? mee) (5.8.6) 


where 


ie 
and mSteo ts! Allxp=x)?. 


We can write (5.8.6) as 


1 Myy Myx 1 
14+ fle Srp] rs 2 (5.8.7) 
which is at least as great as 
nN 1 rv 
; + § ig: ws by 3 i 3 (1 + 6?) = NY (5.8.8) 


where A,, is the smaller ot the two eigenvalues of the matrix in (5.8.7). 
Moreover, (5.8.8) is attained when £ takes on the value (8, say) for 
which 


ree Ee r : (5.8.9) 
= ™ xx | mi a} i 
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that is, when (1, —) is the eigenvector corresponding to A,, . From the 
first equation in (5.8.9) we have 


Miva, — Bmyx = Xn 
so that 
B = (myy — Ym )/ rye - x (5.8.10) 


To find B we now need only find 2,, . But A», is the smaller of the roots of 
the equation 


Mijn EN Ney ace 
Myx ™M xx r 
\ 
That is 
Ne — "(in hny, A — (ni Ree e =O 


from which we have 
1/2 


iY = 


Myx + Myy + l (ese Miyy)- at A (ny. = MxxMyy) | 


2 
_ Mxx ity = [ee Ming pe oe eee 
2 
Clearly 
LLL Le ae Tao) 3 =F 4m3x os 
Am = 


2 
so that from (5.8.10) we obtain 


en iiow emilee ar L(g | + 4m2,] ie 


J Soci 

p ae (5.8.11) 
and, hence, from (5.8.5) 

& = ¥— Bx. (5.8.12) 


The estimates @ and 8 can alternatively be thought of as being obtained 
by a weighted regression. To define the latter we introduce the notation 


Ut 
and wy = ‘ 


We let 2 = E(w,w;). The model comprising (5.8.1—3) can now be 
written as . 


ae =p) be = a° 


Y; 


t 
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and a weighted regression is obtained by minimising the quadratic form 
T 
Pe (z; re Zi) S23 (2) ar Zt) (5.8.13) 
But, by assumption, in the present case 
a4 0 
Q= 
Gai. ie 


so that (5.8.13) becomes 


Aric 1 
Po (ve ae )? fe ae 3 (x4 Ole 
i Pee ee 
sein ik gra el a lo rinryp el ace 68 (5.8.14) 
t=1 Cnr, 


The weighted regression then amounts to minimising (5.8.14) with respect 
toa, B and X,,..., X;. Concentrating (5.8.13) first in terms of a and B 
we set the derivatives of (5.8.14) with respect to the X; equal to zero 
giving 


1 

(-§) (9: -a— 0x1) Ste —x0 10 (tassel, 2 sales 
Thus 

eta BN el px) (5.8.15) 
or 

aia B?)X, = x, + Byr — Ba 
so that 

xX, = ieee FG pO.) (5.8.16) 

t 1 + 6? t Vt . 8. 
From (5.8.15) we see that (5.8.13) can be written as 
T 1g?) 2 
sa (lat Bo) (ie PXy) = (8) » (y; 4a — BX)" 


which, using (5.8.16), becomes 
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x 


a = ree PAC 


Thus, minimisation of (5.8.13) is, in the present case, equivalent to the 
minimisation of (5.8.4) so that @ and B can be thought of as being 
obtained by a weighted regression. It follows that & and B will in this case 
have the properties of weighted regression estimators. A detailed 
discussion of weighted regression and its properties in the context of 
models with errors in variables is given by Malinvaud (1970b, pp. 
383—394). Malinvaud (Proposition 1 on page 387) proves, in particular, 
that a weighted regression in this context leads to consistent estimates 
provided the following additional assumption is satisfied. 


Assumption 5.8.A (Malinvaud’s Assumption 3 on page 377). The 
unknown vectors Z, all satisfy the equation (1, — B°)'’ Z, = «°. The matrix 


ra FW Wy! 
Mop = Ty (2p ZN Ze 2), 
t=1 
where Z=T!Xf.,Zy ts of rank 1, As T> ©, Z tends to a finite limit and 


Mzz tends to a finite matrix of rank 1. 


Remark 1 It is implicitly assumed in Assumption 5.8.A that the sequence 
{Z,} involves non-random quantities. We note that since this sequence 
must satisfy the equation (1, —B°)’ Z, = a°, it follows that 


(1, —B9)'(Z, —Z) = 0 


and hence Mzz cannot have full rank. Moreover, if we let 


then it is clear from the fact that Y,; — Y = B°(X; — X) that Mzz tends to 
the limit matrix 


po fig 
v : 
po Pysi 
Remark 2 Under the further assumption that uw, and v; are both normally 
distributed we can deduce from the form of the likelihood function (and, 


\ 
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in particular, the presence of the quadratic form (5.8.13) in the exponent 
of the likelihood function) that & and B are in this case also the maximum 
likelihood estimates (c.f. Malinvaud, 1970b, p. 387). 


Part (b) From the formulae (5.8.10) and (5.8.11) above we have 


RO Pe ery 
b= — 


4 4 


a = 100—2x 25 = 50. 


Solution 5.9 


The model in the question is a regression model with a single unobservable 
independent variable. This type of model has been the subject of a 
number of recent investigations and the reader is referred particularly to 


Zellner (1970), Goldberger (1972a and 1972b) and Griliches (1974). 


Part (a) Introducing the notation yj =(yi,---) Yir)s¥2 =(Vars++ +> Yar) 
and X' = (x,,..., x7) we see that (5.9.1) and (5.9.2) can be written as 
yi = (XB)y t+ uy (5.9.4) 
y2 i XB ee ux (5-975) 
where uy = (Wy1,--.,Uyr) and uy = (uy1,..., War). From (5.9.5) we 
have 


X'y, = X'X6 + X'u, 


X'ya = ee 6 HE X'uy é 
fi i% bi 
But the covariance matrix of X'u,/T is 


XOX: v Mx 
02 T2 022 T 


which, in view of the stated assumption about M,,., tends to zero as T> ©. 
Thus, by Tchebycheff’s Theorem (c.f. the solution to 2.15 above) 


XG X'X bs 
plim 3 - (tim x) 6 = M,,£, (5.9.6) 


say. Now, from the regression of y, on X we obtain the vector of 
calculated values 


wie XB = X (X'X) 1 X'y2 
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260 
and from the regression of y; on y, we have the estimator of y given by 
(5.9.7) 


A 


~ _ v1 
y=, 
¥292 
But 
Vi = FST TARY 927 ey 
= joy +X(B—B)y + uy a (5.9.8) 
Substituting (5.9.8) into (5.9.7 we have 
a xX 287 
¥ a B) 4 Dams ; (5.9.9) 
Yo) 2 v5 
Now ; ‘ 
Vo92 = y2X(X'X) 71 X'yo 
and 
yok ya 
so that we can write the second term on the right side of (5.9.9) 
U Ye Pao N\a , 
y2X(6 —B) y.X\ [X'X\ 1X 
x(XIK) IX Iy,| aX) (5 —p/(2*\(=*) (X22), 
y2X (XX) UX y2 yp ie 
(5.9.10) 


But, it follows from (5.9.6) that B is a consistent estimator of 6 and 
nt) C X'X\  ( X'y2 an = 
eal\ Ve il =e. MxM x Mag = B M xx B 


hi 
eat 
which is positive for B # 0. Hence (5.9.10) tends to zero in probability 


as T > ce, Similarly we have 
ey (rr) Cry 
L; 


Vote Xe 
Ya ¥2X(X'X) 1 X'yy fa Nae 8 
iy: Se Tad xX’ 
(22) x) ( 2 (5.9.11) 


T fh f & 
But X'u,/T tends to zero in probability for the same reason as does 
Xu,/T aS it follows, therefore, that 
plim ~ = ee = 0. 


ey 2 
Since the last two terms in (5.9.9) have now been shown to tend to zero 
in probability as T > it follows that ¥ is a consistent estimator of y 


FURTHER STOCHASTIC MODELS 261 


Part (b) The procedure in Part (a) involves two steps, in which the 
regression of y. on X provides calculated values y. which are used as 
instruments for the y3 in (5.9.1). Thus, the procedure is essentially an 
instrumental variables method of estimation where the regression of y, on 
X in the first step is designed to provide instruments for use in the 

second step. However, it could be argued that the procedure fails to take 
account of all our information about the structure of the system. Writing 


(5.9.4) and (5.9.5) as 
yi = Xt (Oo by) (5.9.12) 
Voi -XB+ wy (5.9.13) 


we see that a number of cross-equation parameter restrictions are implied 
by the fact that the vector 6 in (5.9.12) is proportional to the vector 6 in 
(5.9.13). In this case, we would expect to obtain more precise estimates 
of B and ¥y by estimating (5.9.12) and (5.9.13) jointly. Furthermore, we 
should take into account in our estimation that there is zero covariance 
between the disturbances of the two equations. Zellner (1970) considers 
these points and develops a procedure based on generalised least squares 
applied to the joint system. 


* = “4 (5.9.1) 
y2 


U®) 
where the covariance matrix of the vector of disturbances on (5.9.14) is 


Oily 0 


We minimise with respect to 6 and 


X6 
XB 


We 0 Na aah 0 
[(y1 — X65)’, (y2 — XB)] : : 
Or at ered Voit 
022 
= Lexie — XBy)'(y, — XBy) + + (y, — XB)"(y2 — XB). 
On O22 


Zellner takes two cases. The first, when the ratio A = 04; /022 is known 
and the second, when there is no prior knowledge of 0,; and 0. In the 
first, we can obtain an explicit solution to the problem conditional on the 
given value of X. In the second, Zellner suggests that we use the same 
solution and estimate the ratio A by 
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5 = P11 = X8)'(y1 = X8) _ Gu 


(Rs O viene (bp mee”. (5) G22” 
say, where 6,, and 622 are estimates of 04; and 022 obtained from the 
residuals of the regressions of y; on X and y2 on X respectively. 

In the first case, the estimators obtained are maximum likelihood 
estimators when the disturbances are normally distributed. Maximum 
likelihood estimators in the more general case where u4; and ux, are 
contemporaneously correlated [that is E (uj;u) = 012 #0, whereas 
E(u U5) = 0,t #s] are considered by Goldberger (1972a) and Wickens 
(1976). The asymptotic variances of these estimators do not seem to have 
appeared in the literature. Since the estimator of y involves quadratic terms 
in the observation vectors y, and y (see Zellner, 1970, page 444) these 
asymptotic variances will depend on the fourth moments of the 
disturbances u,, and uz, (compare the derivations in Malinvaud, 1970b, 
pp. 390-391). 


Solution 5.10 


The model of production which leads to (5.10.1) was first developed by 
Solow (1960). It is based on the hypothesis that new technical knowledge 
affects production by being incorporated in new captial equipment. In 
this sense, technical progress is said to be embodied and is distinct from 
what is usually disembodied technical progress which, in Solow’s words, 
‘floats down from the outside” and thus affects production without 
being incorporated in new equipment. 

In deriving (5.10.1) Solow distinguishes between capital of different 
vintages. Thus, K,(t) represents capital services at time t which are 
obtained from vintage v equipment (v St). Similarly, L, (t) represents 
labour services at time ¢ that are working with vintage v equipment and 
Q, (t) is the output produced at time ¢ from vintage v equipment, i.e. 
K,(t), with labour services L, (t). More specifically, Q, (f) is assumed to be 
determined by 

Ont) ibe LOE RS) & (5.10.3) 
so that the technical relation between L,(t), Ky(t) and Q, (t) is 
Cobb—Douglas and improvements in technology take place at the time (v) 
when capital is installed or built. This means that there is a once and for 
all improvement in the technology that is incorporated in capital of 
vintage v represented by the factor e*” in (5.10.3). This improvement 
takes place instantaneously at time v and capital of this vintage cannot 
then benefit from further technical advances that take place at a later 
time. If J(v) represents the actual investment in vintage v equipment at 
time v, then we write 


K,(t) = e°" 91a) . (5.10.4) 
so that the efficiency of vintage v equipment is assumed to decline 
exponentially at a rate 5 over time. 
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Total output Q(t) is the sum total of output produced from equipment 
of all vintages so that 


t 
Q(t) = | Qy(t)av. (5.10.5) 
Similarly, the total amount of labour employed at time t is given by 
t 
L(t) = f Lo(t)dv. (5.10.6) 


Solow then derives the final form of the production function (5.10.1) 
from equations (5.11.3) to (5.11.6) together with the profit maximising 
assumption that in competitive labour markets labour is allocated to work 
with capital of different vintages in such a way that its marginal 
productivity is the same in every use. 


Part (a) It is difficult to make direct use of (5.10.1) in empirical work 
because of the complicated non-linear way in which the parameters a, 6 
and A enter the equation. In particular, not only is the component of the 
model 


[ er*a(v)dv E = §+ A 


—oo \ I (0% 


unobservable, but it is deffmed directly in terms of the unknown 
parameters. To overcome these difficulties Solow devised an ingenious 
procedure for transforming the model into an estimable form. He 
introduced the new variable 


1/(1-@) 
BG = ee) (5.10.7) 


so that from (5.10.1) we have 


t 
R(t) = BA eel 


and then by differentiation we obtain 


dR(t) = —6R(t) ae Bammer bht Fey) 

dt 
Since a — 6 = A/(1 — a) we therefore have 
dR(t)/dt + 5R(t) 


— Bi/-a gdt/d-a) | (5.10.8) 
I(t) 


264 EXERCISES IN ECONOMETRICS 


from which we obtain (5.10.2) by taking logarithms. 

Given that a and 6 can be estimated extraneously the left hand side of 
(5.11.8) can be calculated approximately by replacing dR(t)/dt by the 
first difference AR(t) = R(t) — R(t — 1) and constructing a series for R(t) 
from time series for Q(t) and L(t) and then using (5.10.7) to find R(t) 
from the extraneous estimate of a. Once this has been done, the slope 
coefficient A/(1 — a) is readily estimated from (5.10.2) from which we 
derive an estimate of A using the extraneous estimate of a. 


Part (b) In spite of the fact that this procedure is very ingenious it is far 
from satisfactory (Solow himself was undoubtedly aware of this as is 

clear from section 4 of his paper.) This is so for a number of reasons. First 
of all the procedure leads to estimates of \ which are conditional on the 
extraneous estimates of a and 6. These latter estimates will themselves be 
subject to sampling error which will lead to a further source of variation in 
the estimate of \ obtained from (5.10.2). To see this we need only note 
that if 6 is the estimated coefficient of.t in the regression on (5.10.2) then 
our estimate of A is 


A = b(1—@) 

where & is the extraneous estimate of a. The sampling variation of A then 
depends on that of 6 and &. To use (5.10.2) for statistical inference about 
\ we should at least have some measure of the sampling variation of a. 

Moreover, in the production function (5.10.1) it is implicitly assumed 
that there is no disembodied technical progress at all. It may seem more 
acceptable to start with a model in which there is both embodied and 
disembodied technical progress. This leads us to a function of the form 


—°co 


Olly == Bele e Gaver (rye | fe oye i : (5.10.9) 


where p is the rate of disembodied technical progress. We then find the 
following equation in place of (5.10.2): 


dR 
in MO 4 (s-—“] OG) = — inp +E oy, 


Thus, when yp is positive rather than zero, we would expect the estimate of 
\ obtained from the regression on (5.10.2) to be biased upwards. 

These points were considered by Wickens (1970) who, in addition, 
developed a procedure for estimating the function (5.10.1) more directly 
than Solow. Working with (5.10.9) rather than (5.10.1), Wickens 
transformed the model into the form 
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t 


In{Q(t)/L(t)] = nB+ [u—6(1—a)]t + (lye) a] f eI) dei2(0)| 


(5.10.10) 


To treat the non-linearity in the last term of (5.10.10), Wickens suggests 
that we use the following approximation 


—oco 


in| f eo) do/L(0) = n|| _erro)deiL(0| (5.10.11) 


—-oco 


ad (0 Soca (Eo (tee) ({ 2e*eoraetece 


which is based on the first two terms of the Taylor expansion of 


t 
In If e*1(o)du/L(0| (5.10.12) 
about its value at o = G, where 6 represents an initial estimate of o. (Note 
that the second term on the right side of (5.10.11) is obtained by 
differentiating (5.10.12) with respect to o and treating it as a function of 
a function) Substituting (5.10.11) into (5.10.10) we obtain the equation. 


In[Q(¢)/L(t)] = InB+ [u—46(1 —a)]t (5210313) 


+P (1— or) In, [em royanyz(0| 


+(1—0)(o='o) In {vere [ee H0yde| 


This equation is non-linear in parameters and linear in the logarithms of 
the following variables: 


Q(t) * ob ‘ee ‘ov 
L(t)’ fe I(v)dv/L(t) and | 2 I(v)de] | I(v) dv (5.10.14) 
Using the initial estimate G, Wickens shows how the last two variables can 
be computed from observations of L(t) and observations (over unit time 
periods) of J(v). The coefficients of the variables on the right side of 
(5.10.13) can then be estimated by ordinary least squares. Wickens 
proposes an iterative procedure in which we now use these estimated 
coefficients to compute a new estimate of o and hence recalculate the last 
two variables of (5.10.14). We should then estimate (5.10.13) again and 
iterate in this way until the estimates of o from successive iterations have 
converged. 


é 
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This procedure has the great advantage of not requiring extraneous 
estimates of some parameters as in the Solow method. We can then 
estimate the standard errors of the estimated coefficients of (5.10.13). 
One final remark should be made. Since we have in this equation 
introduced disembodied technical progress through the parameter py, we 
note that, apart from the constant in (5.10.13), there are three 
coefficients and four parameters [u, 6, wand A; sinceg = 6 + A/(1 — a)]. 
We will not as a result be able to separately identify wu, 5 and XA so that 
these parameters cannot be specifically estimated from the regression. 
This does not, however, prevent us from drawing some inferences about A 
and the reader is referred to Wickens (1970) for further details on this. 


Appendix 


A. THE VEC( ) OPERATOR 


Definition Jf A ts ann x m matrix and a; denotes the ith row of A then 
we write 


Clearly vec(A), which is annm x 1 vector, is an alternative 
representation of the elements of the matrix A. This alternative 
representation is particularly useful in algebraic manipulations when A is a 
random matrix, for the second moment properties of the elements of A 
can often be more conveniently analysed when these elements are 
arranged in vector form (so that there is a matrix of second moments) 
than when the elements are arranged in a rectangular array. A simple 
example is the least squares estimator A* = M,,.M 3}. of A in the multiple 
equation regression model y; = Ax; + u; (see Solutions 3.1, 3.2 and 3.3). 
In this case, the covariance matrix of vec(A*) is given by 


E[vec(A*) — vec(A)] [vec(A*) —vec(A)]’ = T!Q @ My 


where {2 is the covariance matrix of the disturbance vector u; (see 
Goldberger, 1964, p. 209, and Malinvaud, 1970, p. 209). 
The main properties of the vec operation which we find useful are as 


follows: 
Property (i) vec(ABC) = (A ® C’) vec(B) 

where A, B and C are rectangular arrays of dimension n x m,m x p and 
p X q respectively. 


x1x 
¢ é 
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Proof We let D = ABC and write A = [(a;;)], B = [(0:;)], C = [(cis)] and 
D = [(di))]. Then 


diy = 2d 


Dp 
» Ain ORC 
i= 


oF Ms > Gin Cj Or : 
ko 


where cj, is the (j, /) th element of C’. The matrix D has dimension n x g 
so that there are g elements in each row of D and therefore the (¢q + 7)th 
element of vec(D) is dj; = Lp D1 4jn Cj, by). But 


Uy Uy , 
GinG daiG aa eee ee 
‘ok CH Ol 
Cs CG GC 
aa an par. See 


so that the (¢g +j)th row of A @ C’ is the jth row of 
[ai Cae «+--+ QAim Gi] 


ivates 
[aplenty Gare ees Cip)s ate oeain (ene e eee Cy tl (A.1) 
Now since 
OTT OD Uo One 
B= 
Ont Om2 OD 
bi, 
bi. 
vec(B) = | by, (A.2) 
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and the (7g + j)th element of (A @ C’) vec(B) is the scalar product of 
(A.1) and (A.2). That is 


ye eik CORI 
k 1 
Thus vec(D) and (A @ C’) vec(B) have the same (iq + j)th elements and 


since this holds for all 7 and j we have 


vec(D) = (A @C’') vec(B) 


Remark It follows immediately from Property (i) that 


F vec(AB) = (A @ I) vec(B) 
vec(BC) = (I ® C’) vec(B) 


Property (ii) tr(A'C) = [vec(A)]’ [vec(C)] 
OTe, 
Sas 


where aj; is the (7, j)th element of A’ and ¢;; is the (j, 7)th element of C. 
Hence 


tr(A'C) = Py Dy AjiC; 
= [vec (A)]’ [vec(C)] 


where a;; is the (j, 2)th element of A. 
Property (iii) tr(A'BCD) = [vec(A)]'(B @ D’) vec(C) 


where A’, B, C and D are conformable matrices. 


Proof From (ii) above it follows that 
tr(A’BCD) = [vec(A)]' vec(BCD) 
= [vec(A)]'(B @ D’) vec(C) 


and the last line follows from (i) above. 


Final Remark The vec(_) operation as defined above is not the only way 
of rearranging the elements of a matrix into a long vector. Another way 
which is in common use (c.f. Marcus, 1964) is to stack the columns of a 
matrix. Thus, if A isn x m and we write A in the form 
A=[A,,42,..-,Am], where A; denotes the 7th column of A we can 
define 
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ay4 
a2, 
A, 
oer A, ani \ 
vec (A) =a! ©, ah 
12 
Ae 422 
anm \ 


Since the columns of A are the rows of A’ and vec(A’) stacks the rows of 
A’ it follows immediately that 


vec (A) = vec(A’) 


From this relationship between the vec (_) and vec(_ ) operations we 
can deduce the main rules for operating with vec( ) on matrix products. 
For example, the rule corresponding to Property (i) above is 


vec (ABC) = (C' @ A) vec(B) 
since 


vec(ABC) = vec[(ABC)'] 


vec (C'B'A') 
(C' @ A) vec(B’) 
(C' @ A) vec(B). 


II 


B. MATRIX CALCULUS NOTATION 


Definition If ais ap x 1 vector then by 0/da (sometimes written d/da) 
we mean the p x 1 vector operator (0/0a;). 


If a = a(Q) is a scalar function of the elements of a, b = b(a) is an n-vector 
[6;(@)] whose elements 6;(a) are functions of wand A = A(q) is ann x m 
matrix [a;;(@)] whose elements a;;() are functions of a, then we use the 
following notation 


da(a) 
OQ, 
Og ta : da da (a) da(a) 
da i) 400 da,” ‘da, 
da() 
0a 
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as!) (Zee 
dada’ 00; 00; /,; ¥ 
Dp 
0b ; (&) 
2 aeaa pes EA 
OQ; db,,(a) | ° 9a; Oe re 
OQ; 
0b! (eee 0b 0b; (a) 
da da; |. aa dQ; 
I-pXn 
0A [ate 
OQ; 0a; Ne ; ; 
041, (@) 
0a; 
04 17 (a) 
OQ; 
dvec(A) _ | 0aim (&) 0 [vec(A)] 
OQ; 0Q; 0a 
da, (&) 
0a; 
OA nm (Q) 
00; nm X1 
and 
0 vec(A) 2 vec(A) 0 vec(A) 
da’ (ei ENE fe 


pxXnm 


Xxiil 
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Most of the rules for operating in matrix calculus can be found in the 
main econometrics texts and will not be discussed here. We refer the 
reader to Goldberger (1964, pp. 39—44), Theil (1971, pp. 30—33), 


Malinvaud (1970b, pp. 196—198) and Fisk (1967, pp. 144—154). Dwyer 
(1967) and Neudecker (1968) are also useful references in this area. One 


rule we will find it useful to derive here is the following: 


If A = A(X) ts a square non-singular matrix of order n whose elements 


are differentiable functions of the scalar \ then 


a In det[A(A)] = ua (A) aa 
OA Ov 


\ 


Proof We set A = [(ap,)]-. Then 
dIndet[A(A)] _ dIndet[A(A)] 0 det A(A) 
On 0 det A(A) on 
1 0 det[A(A)] 
det A(A) OX 
he - 3 % 9 det[A(A)] dap, 
det A(A) p=1 a=1 Oapq CPN 


where Ap, is the cofactor of ap, in A. Note that the last line follows 


because we have 


det A = y UE Sele 
=1 


and thus 
0 det A as 
Od pq a 
But 
A 0 
eel BE eee Ge la and {pq = 0A 
det A On ONS 
so that 
0 In det[A(A)] 0A 


Or 


as required. 


(B.1) 
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Remark It is important to realise that rule (B.1) holds whether or not 
A(A) is a symmetric matrix. But, note that if A(A) is not symmetric and 
we set \ = a;;, the (7, )th element of A, then 0A(A)/0A has unity in the 
(?, 7)th position and zeros elsewhere. The rule then tells us that 
0 In (det A) 
0a;; 
the (7, )th element of A’™!. Thus 
0 In (det A) zee | 0 
0A 


aa 
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